Using impact evaluation to improve policies and programmes


Conditional cash transfers increase school enrolments and use of health facilities. Community-level water supply does not have health benefits. There is emerging evidence that community-driven development programmes do not increase social cohesion.

These statements can be made with confidence based on the considerable body of evidence from impact evaluations undertaken to answer the question of what works in development. 3ie is now adding this body of evidence as more completed studies are becoming available.

Knowing which interventions don’t work can save scarce development resources. And knowing which work best, and most cost effectively, can make sure these resources are better spent. But impact evaluations can do a lot more. They can help inform programme design. This they do in two ways: (i) multiple-treatment arm designs, and (ii) adopting a theory-based evaluation design. (I will discuss theory-based designs in a subsequent blog.)

Studies with multiple treatment arms examine the impact of different programme designs. So one treatment arm will get one design, say supplementary feeding to tackle child malnutrition. The second treatment arm could have nutritional counselling. Just having two treatment arms will let us compare which of the two treatments is most effective. If we have a no treatment control group arm we can also measure the absolute impact and cost effectiveness of the two treatments.

Multiple treatment arms address intervention design questions of interest to policymakers. Do conditions make a difference for conditional cash transfers, and for what? (some evidence related to education says yes). Does it matter when and how often the transfer is paid? (there is evidence that a large transfer just before school fees are due has a larger impact on enrolments). Does it matter who receives the transfer? (there is substantial evidence that women are more likely to use income for their children’s welfare than are men). What sort of administrative arrangements work best? (bureaucratic procedures, including ‘entering offices’, can deter ordinary people). Should payment be in cash or kind?

Take the example of computer assisted learning. Computer assisted learning has been shown to have a substantial, cost-effective impact on learning outcomes notably at the basic (primary or elementary) level. But is this cost effective? How many computers are needed for a class of 30 children? It is plausible that the learning effects are greater for two children per computer than one. The learning effects may be lower with three, but the cost effectiveness still higher.

Multiple treatment arm studies can test the cost effectiveness of different student-to-computer ratios. And what sort of technical support is needed for teachers? Is it sufficient that they know the basics of how to operate the computer, if that, or do they need intensive training to understand the learning objectives of the software?

Well-designed impact evaluations won’t tell us just if computer assisted learning programmes work (we know they do provided they come with appropriate software, and the school infrastructure is sufficient to support them), but how to make them work better?

It is often argued that there are complementarities in development interventions, for example extension services are only effective when combined with input subsidies. A special case of multiple treatment arms are factorial designs. This is a powerful design which is particularly under-used.

Factorial designs explore the impact of interventions A, B and C=A+B, preferably with a ‘no treatment’ comparison group though there may be practical, political or ethical objections to that. For example, A could be improved water supply, B hygiene education and C is the two together. Or A is microcredit, B business support services and C the two combined.

Factorial designs can test this. But only if the study has sufficient statistical power. Because of the large number of combinations possible, statistical power often becomes a constraint. This is because each possible combination requires a group of possible participants within it that is large enough to detect a statistically significant response). So researchers need to build the analysis of the complementarities into the design from the start.

Multiple treatment arm studies apply experimental or quasi-experimental designs to conduct a counterfactual assessment of the (cost) effectiveness of variations in intervention design to inform better designs. The design variations being tested should be the ones of interest to policymakers, ones they will implement if they are proven to be effective.

How useful are systematic reviews in international development?


This thought provoking question was the highlight of the opening plenary of the Dhaka Colloquium of Systematic Reviews in International Development.

Systematic reviews summarise all the evidence on a particular intervention or programme and were first developed in the health sector.  The health reviews have a specific audience: doctors, nurses and health practitioners. The audience is also easily able to find the systematic reviews.

But there seems to be a big difference in the accessibility of evidence between the health and development sectors. Systematic reviews in international development are targetted at policymakers besides other researchers.  However, policymakers are a diverse group and do not routinely look for evidence for making decisions. And even if policymakers attempted to read systematic reviews, they may possibly think that this is a technical document that did not apply to their particular context.

By highlighting some of the challenges in using systematic reviews in international development, the two plenary speakers , David Myers President and CEO, American Institutes for Research and Julia Littell, Bryn Mawr college offered some useful tips for researchers.

An area where we can learn from the health sector is how to develop a well-defined systematic review question. Currently, our reviews are too broad, our scope is too ambitious and often we do not really address the concerns of policymakers and practitioners. If we ask whether an intervention works or not, we will inevitably come to the conclusion that everything works sometimes and not at other times.  We need to therefore look beyond average impact and evaluate for whom the intervention works, when and in what context.

We need to not only ask the right question but also answer the question in a way that makes sense to those who need to know.  “Policy makers do not care about effect sizes,” David Myer said. They want to know for instance whether the education intervention they implemented is keeping girls in school and how many more they can keep from dropping out.

We need to put our efforts into translating our research findings into plain speak and ensure our messages are short and clear but also accurate. At the same time, we as researchers need to manage expectations and educate our audience.  Findings of large scale studies are rarely definitive. To reach out to our audience, we need to educate them about how to be comfortable with less-than-full answers that make incremental progress toward alleviating problems. Are we asking for too much? Julia Littell believes that we’re better served if we know a lot about little than knowing a little about a lot.  We need to be realistic. There are high expectations, lots of issues that need to be explored and addressed but also limited resources to do so.

It is quite difficult making research relevant to the end user.  We grapple with issues of generalisability and applicability of research to different contexts. This is made even more complicated by the fact that systematic reviews pool evidence from a range of different settings and contexts. The jury is still out on how we can overcome this challenge.  The panel provided the useful suggestion of building capacity for not just conducting systematic reviews but also using them.Systematic reviews and their findings need to be interpreted by those that really understand the context on the ground.

Finally, accessibility of evidence remains a significant challenge in international development. There are hundreds of NGOs that are conducting some kind of evaluation. But we do not know where this evidence is stored or how it could be accessed.3ie has been working on overcoming this challenging by creating a database of impact evaluations which currently has around 750 records.  We are also going to soon start a registry of impact evaluations in international development where researchers can register their ongoing evaluations. We now need to move a step further and work on making sure that researchers and institutions in low- and middle-income countries have access to these evidence libraries.