RE: How to evaluate science, technology and innovation in a development context? | Eval Forward

Dear colleagues,

We were very excited to hear from 23 participants from a range of backgrounds. The richness of discussion by a diverse range of experts, including non-evaluators, highlights an overall agreement on the importance of framing and use of context-specific evaluation criteria for contextualizing evaluations of science, technology, and innovation.  Below please find a summary if points made, with an invitation to read actual contributions from authors if you missed.

Frame of reference

The following frameworks were introduced: Quality of Research for Development (QoR4D) frame of reference , the Research Excellence Framework (REF) and the RQ+ Assessment Framework

The main discussion touched upon the Quality of Research for Development (QoR4D) frame of reference elements, and directly or indirectly linked to the evaluation criteria: relevance, legitimacy, effectiveness and scientific credibility.

For Serdar Bayryyev and Lennart Raetzell, the context where products are used when determining their relevance was key. When assessing effectiveness (or quality) Serdar suggests to (1) assess the influence of the activities and the extent to which the science, innovation and research products have influenced the policies, approaches, or processes and (2) assess the degree of “networking”, i.e., the degree to which the researchers and scientific institutions have interacted with all relevant stakeholders. Lennart Raetzell shared her recent experience with a thematic impact evaluation for VLIR-UOS ( on pathways to research uptake mainly in the field of agriculture.

Nanay Yabuki and Serdar Bayryyev agreed about importance of assessing transformational nature to determine whether research activities cause truly transformational change, or at least trigger important policy discourse on moving towards such transformational change. Relevance of science, technology, and innovations (STI) is context specific, as it is how STI triggers a transformational change.

Sonal D Zaveri asserted that Southern researchers agree about the need for research to be relevant to topical concerns, to the users of research and to the communities where change is sought. Uptake and influence are as critically important as is the quality of research in a development context, hence evaluations must measure what is important for communities and people. Many evaluations are designed at distance and evaluation choices are privileged, due to power of expertise, position, or resources. It would be difficult to accept the quality of science bereft of any values related to human rights, inclusion and equity. Unless the findings are used and owned by people —and especially those with little voice in the program—, it cannot be claimed that evaluations have led to public good or used for the benefit of all peoples. On that topic, Richard Tinsley highlighted the importance of considering the final beneficiaries of the scientific results. He stressed a need to have clarity about CGIAR’s primary clients (the Host Country National Agriculture Research Systems (NARS for CGIAR), and the final beneficiaries (multitude of normally unnamed smallholder farmers), who are still one step removed from the CGIAR’s clients (NARS).

From a donor point of view, Raphael Nawrotzki finds that the sub-components of the QoR4D frame of reference are important to measure the quality of “doing science” rather than the results (output, outcome, impacts). Hence, the need to use the OECD DAC criteria of impact and efficiency to capture the “doing” (process) but also the “results” of the research for development enterprise. He asserted that the focus for a funder is on impact (what does the research achieve? What contribution does it make?) and, to a lesser extent, efficiency (how well were resources used, how much was achieved for amount spent?).

For Nanae Yabuki, using the scientific evidence for enhancing the impact of the interventions, reflected FAO’s mandate. For assessing these aspects, ‘utility’ of the research findings is more relevant than ‘significance’ of research findings.  Hence the need to come up with appropriate criteria for each evaluation.

Norbert TCHOUAFFE, brought in Theory of Change as a tool to evaluate the impact of science-policy interface network on a particular society, based on five determinants (Scorecards on awareness, Know-how, Attitude, Participation, Self-evaluation).


The discussants agreed about the importance of using a mixed-method approach to combine both qualitative and quantitative indicators. According to Raphael Nawrotzki, mixed-method approach, is needed especially when evaluating relevance of the research questions and fairness of the process.

Quantitative methods: strengths and limitations

Among quantitative methods, the use of bibliometric analysis was mentioned for:

  • evaluating science impact, i.e., impact within a scientific field, still measured best by the number of citations that an article or book chapter receives (Raphael Nawrotzki);
  • assessing the legitimacy of the research findings and the credibility of the knowledge products (Nanae Yabuki);
  • providing a good indication of the quality of science (QoS), since published papers have already passed a high-quality threshold as they have been peer-reviewed by experienced scientists (Jillian Lenne);
  • providing an important overview of the efforts made, and the scientific outreach achieved (Paul Engel).

Thinking about QoS and innovation evaluation, Rachid Serraj illustrated the use of bibliometric and citation indices from the Web of Science (WoS).

Etienne Vignola-Gagné, co-author of the Technical Note, highlights the new and broader range of dimensions of bibliometrics indicators —namely cross-disciplinarity, gender equity, pre-printing as an open science practice, or the prevalence of complex multi-national collaborations— useful for assessing relevance and legitimacy. Some bibliometric indicators can also be used as process or even input indicators, instead of their traditional usage as output indicators of effectiveness. Bibliometrics can be used to monitor whether cross-disciplinary research programs are indeed contributing to increased disciplinary integration in daily research practice, considering that project teams and funders often underestimate the complexity of such research proposals.

For Valentina de Col bibliometrics (e.g., indexing of the Web of Science Core Collection, percentage of articles in Open Access, ranking of journals in quartiles, Altmetrics) were used on published journal articles, and Outcome Impact Case Reports (OICRs) to describe the contribution of CGIAR research to outcomes and impact. Raphael Nawrotzki suggested other related bibliometric indicators: (a) contribution to SDGs; (b) average of relative citation; (c) highly cited publications; (d) citation distribution index.

Keith Child and Serdar Bayryyev noted limitations of bibliometric analysis. For example, not all science, innovation and research products are included and properly recorded in the bibliographic databases, or not even published, hence not all products can be assessed. Furthermore, calculating average number of citations, also presents the basis for some biases: (1) overly exaggerated attention to a specific author; (2) some authors may also deliberately exclude certain reference materials from their publications. Raphael Nawrotzki noted limitation specifically associated with measuring scientific impact through bibliometrics: (1) Long time periods (it can take decades for results from investments in agricultural research to become visible; a robust measurement of science impact in terms of bibliometrics is only possible about 5 years after a research project or portfolio has been completed); (2) Altmetrics (it is difficult to combine bibliometrics and altmetrics to get a full picture of scientific impact); (3) Cost effectiveness (the fraction of support attributable to each funding source is not easily determined; computing cost-effectiveness measures comes with a host of limitations). Paul Engel extended the list of limitations of bibliometrics: it provided very little information on policy outreach, contextual relevance, sustainability, innovation and scaling of the contributions generated through research partnerships. Ola Ogunyinka asserted that the ultimate beneficiaries of the CGIAR (smallholder farmers and the national systems) are far removed (access, funds, weak structures etc) from the journals considered in the bibliometric analyses.

Overall, Jill Lenne and Raphael Nawrotzki agreed on the value of using altmetrics.

Graham Thiele suggested use of social network analysis (SNA) of publications, to explore who collaborates to these and what is their social and organizational context as a complement to bibliometric analysis, particularly for the legitimacy dimension. Valentina de Col used SNA and impact network analysis (INA) to investigate the research collaboration networks of two CGIAR research programs.

Finally, Graham Thiele warned against the risk of using state of the art methods and increased precision of bibliometric analysis (currently available and produced on a continuous basis) at the expense of missing the rounded picture that other studies —such as outcome case studies and impact studies— provide. This point is supported by Paul Engel, based on his experience evaluating Quality of Science in CGIAR research programs.

Guy Poppy introduced the Research Excellence Framework (REF), which alongside assessing research outputs, also evaluates impact case studies and the research environment producing a blended score with outputs having the biggest weighting but impact growing in weighting.

Qualitative methods: strengths and limitations

Using qualitative methods, along with bibliometrics and altmetrics, is essential for a broader picture when assessing quality of science.

Qualitative assessments can be done through interviews and/or surveys. With regards to measuring impact, Valeria Pesce highlighted that qualitative indicators are often based on either interviews or reports, and making sense of the narrative is not easy. She echoed post from Claudio Proietti, who introduced ImpresS.

Ibtissem Jouini highlights trustworthiness of evidence synthesis, yet challenged by the variety of evidence that can be found, the evaluation criteria, approaches, focus, contexts, etc.

Limitations of qualitative methods were also noted by Jillian Lenne and Keith Child –qualitative assessments require the evaluator to make subjective judgments.

Consideration for participatory approaches and methods were highlighted by Sonal D Zaveri: the difference between "access" and "participation", passing though the concept of power (hidden or explicit). Women as traditional bearers of local and indigenous knowledge find themselves cut off from the networked society, where information, communication, and knowledge are ‘tradeable goods’.

Valeria Pesce, Etienne Vignola-Gagné and Valentina de Col discussed actual tools and ways to address challenges of both qualitative and quantitative indicators: IT tools, that allow to (sometimes automatically) classify against selected concepts, identify patterns, word / concept frequency, clusters of concepts etc., using text mining and Machine Learning techniques, sometimes even starting directly from video and audio files.

For narrative analysis: ATLAS.ti, MAXQDA, NVivo- powerful narrative analysis;  Cynefin Sensemaker and Sprockler  for design and collection functionalities NarraFirma - strong conceptual backbone, helping with the design of the narrative inquiry and supporting a participatory analysis process.

Conclusion and next steps:

Even without standardization of methods, efforts should be made to design the STI evaluations so that the evaluation results can be used, to the extent possible, at the institutional level, for instance for higher strategic and programmatic planning (Nanae Yabuki, FAO), but also at the level of those who are affected and impacted (Sonal D Zaveri, and others).

Valentina de Col highlighted the value of consolidating and adopting a standardised approach to measure quality of science (QoS) within an organization like CGIAR, to help measure better the outcomes, assess effectiveness, improve data quality, identify gaps, and aggregate data across CGIAR centres.

The worthiness of this discussion for learning is undeniable. In CGIAR, at CAS/Evaluation we have started developing the guidelines to operationalize quality of science evaluation criterion in the revised CGIAR Evaluation Policy. Let us know if you are interested in further engagement.  

Referenced and Suggested readings

Alston, J., Pardey, P. G., & Rao, X. (2020) The payoff to investing in CGIAR research. SOAR Foundation.

Belcher, B. M., Rasmussen, K. E., Kemshaw, M. R., & Zornes, D. A. (2016). Defining and assessing research quality in a transdisciplinary context. Research Evaluation, 25(1), 1-17.
DOI: 10.1093/reseval/rvv025

Norbert F. Tchiadjé, Michel Tchotsoua, Mathias Fonteh, Martin Tchamba (2021). Ecological engineering to mitigate eutrophication in the flooding zone of River Nyong, Cameroon, Pages 613-633:

Chambers, R. (1997). Whose reality counts (Vol. 25). London: Intermediate technology publications.

Evans, I. (2021). Helping you know – and show – the ROI of the research you fund. Elsevier Connect.

Holderness, M., Howard, J., Jouini, I., Templeton, D., Iglesias, C., Molden, D., & Maxted, N. (2021). Synthesis of Learning from a Decade of CGIAR Research Programs.

IDRC (2017) Towards Research Excellence for Development: The Research Quality Plus Assessment Instrument. Ottawa, Canada.

Lebel, Jean and McLean, Robert. A Better Measure of research from the global south, Lancet, Vol 559 July 2018. A better measure of research from the global south (

McClean R. K. D. and Sen K. (2019) Making a difference in the real world? A meta-analysis of the quality of use-oriented research using the Research Quality Plus approach. Research Evaluation 28: 123-135.

Ofir, Z., T. Schwandt, D. Colleen, and R. McLean (2016). RQ+ Research Quality Plus. A Holistic Approach to Evaluating Research. Ottawa: International Development Research Centre (IDRC).

Runzel M., Sarfatti P. and Negroustoueva S. (2021) Evaluating quality of science in CGIAR research programs: Use of bibliometrics. Outlook on Agriculture 50: 130-140.

Schneider, F., Buser, T., Keller, R., Tribaldos, T., & Rist, S. (2019). Research funding programmes aiming for societal transformations: Ten key stages. Science and Public Policy, 46(3), pp. 463–478. doi:10.1093/scipol/scy074.

Singh,S, Dubey,P, Rastogi,A and Vail,D (2013) Excellence in the context of use-inspired research: Perspectives of the global South Perspective.pdf (

Slafer G. and Savin R. (2020) Should the impact factor of the year of publication or the last available one be used when evaluating scientists? Spanish Journal of Agricultural Research 18: 10pgs.

Vliruous (2019). Thematic Evaluation of Departmental Projects: Creating the Conditions for Impact.

Zaveri, Sonal (2019). “Making evaluation matter: Capturing multiple realities and voices for sustainable development” contributor to the journal World Development - Symposium on RCTs in Development and Poverty Alleviation.

Zaveri, Sonal (2021) with Silvia Mulder and P Bilella, “To Be or Not to Be an Evaluator for Transformational Change: Perspectives from the Global South” in Transformational Evaluation: For the Global Crisis of our Times edited by Rob Van De Berg, Cristina Magro and Marie Helene Adrian 2021-IDEAS-book-Transformational-Evaluation.pdf (

Zaveri, Sonal. 2020. ‘Gender and Equity in Openness: Forgotten Spaces’. In Making Open Development Inclusive: Lessons from IDRC Research, edited by Matthew L. Smith, Ruhiya Kristine Seward, and Robin Mansell. Cambridge, Massachusetts: The MIT Press.