Monthly Archives: September 2014

Unexpected evidence on impact evaluations of anti-poverty programmes

The first email that caught my eye this morning as I opened my inbox was Markus Goldstein’s most recent World Bank blog post, “Do impact evaluations tell us anything about reducing poverty?” Having worked in this field for four years, I too have been thinking that we were in the business of fighting poverty, and like him, I expected that impact evaluations, especially impact evaluations of anti-poverty programmes, would tell us whether we are reaching the poor and helping them to a better life. Like Markus, I have been finding otherwise.

In May this year, 3ie commenced work on a USAID-supported evidence gap map on productive safety net programmes.  The striking finding from the impact evaluations we’ve identified so far is that they echo Markus’ findings. While about half of the studies at some point mentioned that the population of interest was the poor, the majority of the studies do not report what that means.

The 3ie-USAID Evidence Gap Map is focusing on important poverty alleviation outcomes along the causal chain.  For this exercise, we defined productive social safety nets as programmes that include livelihood or income-generating components to expand market opportunities and help stabilise consumption, increase and diversify incomes, build and protect assets or improve food security of extreme poor individuals, households and communities.

Our mapping is still ongoing, but so far, we have identified 216 impact evaluations of productive safety net programmes that use rigorous experimental and quasi-experiment study designs. This includes both published and unpublished studies, and both completed and ongoing impact evaluations identified through a systematic search of the literature.  For more information about the evidence gap map methods, click here.

Our gap map covers six broad intervention categories:  social protection, financial services, land reform, microenterprise support services, collective action facilitation and multicomponent safety net interventions. As these interventions are frequently used to help people escape poverty, we would expect that the majority of the impact evaluations would target the poor, report details of the target population and measure whether people indeed escaped poverty.

However, this information too often is not reported. And where studies report targeting the poor, they rarely report whether the definition of being  poor was based on the US$1.25 cut-off, a national poverty line, their daily calorific intake, asset ownership or some other non-income deprivation measure. Only nine of the 216 included studies (4 per cent) report enough information that allow us to determine whether the population or sample was on average living on less than US$1.25.  Getting even those details often required detective work. Only 16 of the included studies (7 per cent) report whether the population or sample was on average living below a national poverty line. Only 32 studies that report targeting the poor (25 per cent) conduct some heterogeneity analysis by income or poverty status to see if the effect of the intervention varies for different target populations.

The evaluation community needs to pay more attention to this crucial issue. If we do not clearly report information about the target population and how impacts vary across people’s incomes or wealth status, our impact evaluation findings are nigh on useless to programme implementers who need to know whom to target to achieve the best results.


The evidence gap map for USAID should be completed by the end of October 2014.  We will post it on the 3ie website. It will present a visual overview of existing and ongoing systematic reviews and impact evaluations of productive safety net programmes, schematically representing the types of interventions evaluated and outcomes reported. It will include links to summaries of the included evidence on the 3ie database and enable policymakers and practitioners to explore the findings and quality of the existing evidence and facilitate informed judgement and evidence-informed decision-making in international development policy and programming. The gap map will also identify key gaps where little or no evidence from impact evaluations and systematic reviews is available, and where future research should be focused.

If you want to sign up to get our announcement when the gap map is released, please email mentioning ‘gap map’ in the subject line.

How 3ie is tackling the challenges of producing high-quality policy-relevant systematic reviews in international development

flickr_EU Humanitarian aid_8022578954At its annual colloquium being held in Hyderabad, India, the Cochrane Collaboration is focusing on evidence on the global burden of disease of mostly treatable illnesses that are concentrated among populations living in low- and middle-income countries (L&MICs).  We already have a lot of systematic review evidence about what works to prevent and treat them.  Yet they remain prevalent due to the lack of resources, implementation capacity and population attitudes. What we lack is implementation evidence about what works to overcome those known barriers. At 3ie, we face similar ones in producing high-quality and policy-relevant systematic reviews in international development.

Like 3ie, Cochrane provides guidance on how to conduct high-quality systematic reviews, builds capacity among authors from L&MICs to undertake them, and supports their use by decision makers. The Cochrane model, which originated in the clinical trials community, provides for a rigorous, transparent and trustworthy review product. As one leading development evaluator commented recently, “If a review of intervention effects is not published by Cochrane, or its sibling organisation, the Campbell Collaboration, I simply don’t trust the findings.”

What 3ie has found is that simple application of the traditional review method to development interventions is problematic (see here). A traditional review would not be able to answer many of the most relevant development questions for policymakers and implementers or provide sufficiently nuanced evidence to apply to complex interventions and contexts. Furthermore, findings from international development reviews are usually communicated badly, in formats which are impenetrable to decision makers. And they take ages to do, frequently 24 months (sometimes longer). Even studies that are able to communicate answers to relevant questions may arrive in the hands of implementers after decisions have been taken.

3ie has done much work to show that systematic reviews can answer relevant questions on complex development programmes.  Our reviews draw on a theory (or theories) of change to understand the intervention, the implementation processes and the evolution in outcomes for beneficiaries. These reviews can answer a very broad range of policy questions, not just on what works, where and for whom but why and at what costs, as well as questions around scaling-up. For example, a new review which examines the effectiveness of targeting agricultural programmes is available here.

Better evidence from systematic reviews can make development more effective and improve people’s lives. So review findings need to be written in easily accessible language and available to a wide audience of policymakers, international development professionals, and other users of evidence. Cochrane provides short, user-friendly summaries for informing clinical practice, while the summary of findings tables in these reviews can effectively communicate technical information about the quality of the evidence. These methods should be used, but they are not sufficient to communicate the nuanced findings typical of reviews on development topics.

3ie recently launched a summary report series that aims to present review evidence in a way which does justice to the development interventions and synthesised evidence in a more useful format for non-research specialists. These summaries are usually 25 pages and we require that they are written in accessible, jargon-free language that is easy to comprehend by non-research audiences. Our first summary report using this approach is on the effectiveness of farmer field schools.  We will be launching it in October in London.  We are now turning our efforts to launching improved policy briefs on systematic reviews that will meet our audience’s needs for succinct presentations of main points.

For all of our efforts to produce easily comprehensible summaries of full reviews, what matters is that decision makers have the evidence they need when they have to make decisions.  For some users, this means sacrificing rigour of analysis for the sake of meeting immediate evidence demands. It is my firm belief that this will undermine the key value of full systematic reviews – their reliability. More careful question setting by those demanding reviews is needed to ensure they are answerable. But the key constraint to review timelines is on the supply side. Some obvious things which can be done include better study management by author teams, reforming currently slow and inflexible review processes and reducing external peer review timelines to no more than one month.

Another constraint on the supply side is that many more resources are needed to develop authors’ skills in undertaking rigorous reviews and to train peer reviewers to support these reviews. 3ie is partnering with Cochrane’s Global Evidence Synthesis Initiative to build capacity to undertake and quality assure systematic reviews in L&MICs. The Campbell International Development Coordinating Group is also seeking expressions of interest from experienced review organisations in L&MICs for a satellite to support peer review and capacity building.

To help address the growing need to fund relevant systematic reviews in international development, 3ie is also announcing its seventh call for proposals from research teams on 30 September to answer six systematic review questions. These questions were developed in consultation with major implementing agencies. They include effectiveness of hygiene and non-food aid in humanitarian situations, programmes to empower women in the workplace, and the effects of workfare programmes, including the Government of India’s National Rural Employment Guarantee Scheme.  Weighting will be given to teams either from or partnering with organisations in L&MICs. Successful teams will also need to demonstrate that products from their rigorous reviews will be policy-relevant and written in accessible language according to 3ie’s report guidelines. We look forward to reviewing applications and hope that the reviews we fund can help answer questions for complex development programmes.

Making impact evidence matter for people’s welfare

DSC_0222The opening ceremony and plenary session at the Making Impact Evaluation Matter conference in Manila made clear that impact evidence – in the form of single evaluations and syntheses of rigorous evidence – do indeed matter. Two key themes were (1) strong evidence about the causal effects of programmes and policies matter to making decisions that improve the welfare of people living in low- and middle-income countries and (2) that, to make impact evaluation matter more, we need to continue to make efforts to build capacity to generate, understand, and use such evidence in those same countries.

Impact Evaluations do matter for decision-making
As this conference was hosted in Manila – the first conference focused on impact evaluation in Asia – it is no surprise that a repeated example of evidence-informed decision-making related to the Philippines conditional-cash transfer programme, known as the ‘4P’s. As noted both by the Philippine socioeconomic planning secretary Arsenio Balisacan and Chair of 3ie’s Board Richard Manning in the opening ceremony of the conference, impact evaluation evidence has been crucial in maintaining and expanding the 4Ps programme. It will ideally also be crucial in insulating 4Ps from party politics and changes in the administration. Moreover, the impact evidence led to an important modification of programme coverage, expanding eligibility to poor students at the secondary level as well as those at the primary level.

This is one example of how high-quality evidence can inform decision making and, ultimately, make a difference to people’s lives. But for impact evaluations to matter they need to be available and address key policy questions. As Richard Manning noted, while the volume of impact evaluations and evidence syntheses produced in recent years has grown quickly – with about 300 new studies produced every year – only 2-3 per cent of global development spending is subjected to evaluation. What about the rest? Without evidence we cannot offer clear guidance to policymakers about which programmes are most effective in improving lives.

According to Bindu Lohani of the Asian Development Bank (ADB), the evidence gap is a more urgent concern in some sectors than others. He highlighted infrastructure – a focal investment area for the ADB – as an example of where impact evaluation evidence is lacking. As the effects of a changing climate grow in magnitude, the need to build an evidence base on interventions for climate resilient infrastructure keeps increasing.

Building capacity and ‘working with’
Richard Manning highlighted that the past ten years has seen a shift, from an initial focus on evaluating the effects of aid programmes, to a broader focus on development effectiveness.  Rather than being researcher- or donor-led, the demand for evidence increasingly comes from national and sub-national governments in low- and middle-income countries. Policymakers in these countries have limited resources and want to ensure their fair allocation and efficient use. The Philippines exemplifies this. With support from the Australian government, the Philippine government has worked towards generating and using evidence to inform their social programming.

With this shift, it becomes all the more important that researchers and decision-makers gain capacities in setting priorities and asking key questions, generating evidence, and then understanding and making use of this rigorous evidence. Bindu Lohani stressed that to ensure more widespread use of evidence, we should work to build the capacity in all countries to produce evidence that meets high standards.

In this way – and through early-and-often engagement (as highlighted by 3ie in its two days of pre-conference workshops and again by Paul Gertler in the opening plenary) between researchers, decision-makers and local stakeholders – more evidence can be produced that is rigorous and useful. This is a fundamental step in making impact evaluation matter. International donors will continue to play an important role in this. Richard Manning highlighted the example of the Philippines-Australia collaboration, where a bi-lateral funder supports a country in building the evidence across a range of programmes, as good model for others to follow.

The Philippine secretary of social welfare and development Corazon Juliano Soliman  noted that impact evidence can be lost in translation and therefore not used or used most beneficially.  She implored impact evaluators to speak plain English to users, if they want their evidence used.  More work is required to ask questions that matter to decision-makers and then to convey the results in a jargon-free language that decision-makers can understand

The plenaries and debates in Manila last week highlighted that impact evaluation matters. It is also clear that impact evaluation can matter more for improving the welfare of people living in low- and middle-income countries. One way to make it matter more is to better understand how impact evidence has been and can be used in decision- and policy-making. Another way is to build capacity to generate, understand, and use rigorous evidence in all countries.

Early engagement improves REDD+ and early warning system design and proposals

DSC_0179At 3ie, our mission is to fund the generation and sharing of sound, useful evidence on the impacts of development programmes and policies work. Actually, we’re more curious (or nosy) than that. For impact evaluation that matters, we need to know which bits of a programme worked, which didn’t, why and through which mechanisms, in which contexts and for what costs.

Why do we need all this information? Because we want to keep increasing the quality and relevance of the evidence we fund. For the past five years, we have funded impact evaluations and syntheses of such evaluations. Through this, we’ve learned some important lessons about what it takes to get quality, useful impact evidence. Our experience indicates — and so our working hypothesis is — that requiring and fostering engagement at the beginning of the study design process will improve study quality and uptake of the generated evidence. This engagement is both between researchers and implementing agencies and between researchers and 3ie.

We promote this engagement through a new proposal preparation phase within our grant-making cycle. This involves a small pot of seed money and a lot of dialogue. Teams get vital inputs at the right time to shape questions that matter to key decision-makers and study designs that can provide rigorous answers.

By including implementing and other stakeholders, there are opportunities for more decision-makers to gain ownership in the design and, in turn, the results. We think that this increases the likelihood that the studies will be useful to and used by those who have commissioned them.

This summer, we awarded these preparation grants for the second time, through our Climate Change and Disaster Risk-Reduction Thematic Window, funded by Danida.  At our global conference Making Impact Evaluation Matter in Manila this week – also the mid- point of the preparation phase — we held an inception workshop. This brought together nine teams of researchers and implementers planning studies on REDD/REDD+ programmes and early warning systems across nine countries.

We set some goals for our inception workshop. A central goal was for researchers and implementers to engage on a common understanding of the theory of change under study. Moreover, we wanted them to collaborate on a set of rigorous research questions that could help fill knowledge gaps and drive policy and programmatic decision-making.

Towards this goal, two members of each team (an evaluator and an implementing partner) presented their preparation progress to other teams and 3ie. These included some exciting questions and designs, including grappling with the multi-layer, multi-level and politically sensitive interventions related to REDD+ as well as with the infrequent – but devastating – occurrences of natural disasters such as floods from rising sea-levels and landslides from rising river levels. We were particularly encouraged to see different teams working on the same country looking for ways to collaborate and share information.

For each presentation, the team received constructive feedback on their design and had the chance to benefit from peer-sharing and collegial learning.  3ie had a chance to give its inputs directly, face-to-face and more dynamically than in the past.

Of course, we also allotted time for feedback and evaluation of the preparation process, so we can inform and improve our own organizational processes with evidence. Our REDD+ and disaster risk-reduction system participants told us that they found the preparation phase and the inception workshop to be very worthwhile.

These changes in our grant-making mean that preparation grantees are getting more engagement with and guidance from us. In particular, grantees felt they benefitted having check-in points to dialogue with 3ie and from the requirements to visit the study country, meet with partners and hold at least one stakeholder workshop. These processes helped ground their questions and designs in the physical, institutional and political realities of the countries.

We expect that this approach will help teams submit stronger and richer proposals in mid-October. We’re looking forward to see how much further teams have been able to push their ideas by then towards our goals of generating sound, useful evidence.

Watch interviews of researchers and implementing agency representatives highlighting the need for evaluating early warning systems in Nepal as well as the importance of stakeholder engagement in evaluating the impact of REDD+ activities in the country.

Twelve tips for selling randomised controlled trials to reluctant policymakers and programme managers

10476030246_28c09fb23d_mI recently wrote a blog on ten things that can go wrong with randomised controlled trials (RCTs). As a sequel, which may seem to be a bit of a non-sequiter, here are twelve tips for selling RCTs to reluctant policymakers and programme managers.

  1. The biggest selling point is that a well-designed RCT tells the clearest causal story you can. There is, by design, no link between beneficiary characteristics and programme assignment. So any difference in outcomes must be because of the programme, and not any underlying difference in treatment and control groups.  Moreover…
  2. RCTs are easy to understand.  You just need to look at the difference in mean outcomes between treatment and control. That is easy to calculate and easy to present. It’s true that in economics we usually calculate the mean difference using a regression with control variables added, but it can readily be presented as a simple mean difference.
  3. RCTs are a fair and transparent means of programme assignment.  In a typical development programme, intended beneficiaries and even agency staff, have no idea how or why communities get chosen to benefit from the programme. This is changing with RCTs which have lotteries to choose programme beneficiaries. A lottery is much fairer than deals done in backrooms which may be prone to political interference.  It is conducted in a transparent way. Public drawings are held with key stakeholders present. They take part in a lottery to decide who gets the programme. Although this is a fairer approach, there still objections that RCTs are unethical because the intervention is withheld from the control group. But,
  4. It is not necessary to have an untreated control group. An RCT may have multiple treatment arms, at its simplest, comparing intervention A with intervention B, where treatment B may be – as with many clinical trials – what is being done already. And a factorial design adds a third treatment arm which receives both A and B. This helps us answer the question of whether the two interventions work better together or separately.
  5. RCTs can lead to better targeting.   Randomisation doesn’t mean you randomise across the whole population. Randomisation occurs across the eligible population, so the intervention is still targeted as planned. Since an RCT requires you to clearly identify and list the eligible population, it may result in better targeting than would have been achieved without this discipline.
  6. Randomisation doesn’t have to interfere with programme design or implementation. For a start, you don’t need to randomise across the whole intended beneficiary population. Once you do the power calculations, you will have the size of the sample required for randomisation, And for a large programme, it is likely that only a subset of the intended beneficiary population is required. The programme managers can do what they like with the rest, which may well be the majority.
  7. Or minor adjustments can be made to the eligibility criteria (a ‘raised threshold design’) to yield a valid control group in a non-intrusive way.  Oversubscription can be generated by adjusting the threshold.  Participants are then selected at random, that is by a lottery.
  8. An encouragement design randomly assigns an encouragement to participate in the programme, not the programme itself. This will have no effect on how the programme is run, and will in fact additionally yield useful information on increasing take-up of the programme.
  9. Finally, a pipeline RCT exploits the fact that the programme is being rolled out over time and that there are almost certainly untreated members of the eligible population who can form a valid control group. The RCT therefore simply just randomises the order of the treatment.
  10. Well-designed RCTs can open the black box. RCTs need not just focus on the ‘does it work question’. They can also look at variations in intervention design to determine how to make it work better. But even if it doesn’t do that, then
  11. The black box can be a blessing not a curse since RCTs cut through complexity. The causal chain of a programme can sometimes be too complex to unravel. RCTs can therefore help establish causality in the face of complexity. So in conclusion
  12. It is unethical not to do RCTs: Without RCTs, we are spending billions of dollars on programmes which, at best, we have no evidence for. And in reality many of these programmes are not working. So, it’s better to spend a few million on well-designed RCTs than billions on failed development programmes.

Given all these arguments it is not surprising that there have been 100s of RCTs of development programmes in recent years. But there are still many gaps in our knowledge of what works and why in achieving development goals. So let’s have more and better studies for better policies and programmes and better lives.