Monthly Archives: June 2018

Making replication research relevant for international organizations: A 3ie-IFAD post-event conversation

After 6 years, 3ie’s replication programme is finishing its fourth round of 3ie-funded replication studies. In recognition of this round’s completion, 3ie and the International Fund for Agricultural Development (IFAD) recently hosted a joint engagement event, Financial services for the poor programmes – verifying evidence for policymaking. Ben (3ie) and Michael (IFAD) co-hosted the event. At the event, 3ie’s current replication researchers presented their draft results.

The current round of 3ie-funded replication studies, funded by the Bill and Melinda Gates Foundation, focuses on financial services for the poor. The Gates Foundation staff selected seven studies based on recent development-related impact evaluations, which were important for their programming (more information on the programme is available here). The replication research teams for each of the seven studies presented their papers at the event in Rome.

After the individual replication teams’ presentations, a group of experts from 3ie, IFAD, the Centre for Economic and International Studies and the Food and Agricultural Organisation formed a panel to discuss the current state of research transparency efforts. Michael gave the closing keynote address, where he summarised how replication research might fit into future IFAD-funded projects.

The importance of emphasizing policy relevance was one of the key takeaways from the event. The event participants repeatedly challenged the replication researchers to use their studies to provide concrete recommendations for policymaking. Michael highlighted the importance of the Sustainable Development Goals (SDGs) to many policymakers. He suggested that the replication researchers should partially motivate their studies by framing them around the SDGs that the research addresses.

In the following conversation, we have a dialogue in which Michael shares his overall reflections on the event.

Ben: Michael, thank you again for co-hosting the event with 3ie. I thought this blog would be a nice opportunity to summarize your insights. Would you briefly give us your main takeaways from the day?

Michael: We should seriously consider funding and implementing more and longer-term studies for quality evidence to make decisions (i) on the types of offerings, products, services and approaches promoted through IFAD co-financed projects and programmes and (ii) the content of national policy engagement or dialogue. I also see great opportunities in pursuing more external and internal replications for enhanced evidence concerning long-term impact. Last but not least, the challenge of measuring for results and gained (rural) market or system development remain.

Ben: As you highlighted in your presentation, IFAD is a major player in rural poverty alleviation work. Given IFAD’s large amount of programming on this topic, what kind of replication research evidence would be most helpful for you? And how would you suggest it be packaged?

Michael: As IFAD’s Lead Technical Specialist for Inclusive Rural Financial Services, my interest is clearly focused on gaining more empirical evidence on how these investments are a means to an end regarding more food security, reduced vulnerability of rural dwellers and sustainable poverty alleviation. Additional research needs to address the levels of developing inclusive rural financial markets and systems, addressing the micro-level in terms of impact, creating an enabling market infrastructure that is ubiquitous, safe and competitive, and defining the elements necessary for a policy and regulatory framework for responsible and impactful financial inclusion. In particular, additional research addressing minor-level impacts should focus on how poor people are enabled to capture opportunities and build resilience and how financial service providers offer affordable, responsible, accessible and sustainable financial solutions for a significant number of poor people.

Ben: Replication research is a nascent field. 3ie recently released our transparency policy, which includes a commitment to push button replicating all 3ie-funded research. At the end of your presentation, you suggested a few possible next steps for conducting replication research with IFAD. Would you mind elaborating on one or two of those ideas here? How might we integrate replication research into IFAD’s portfolio?

Michael: In line with my two or three takeaways mentioned earlier, I would think that we could start selecting a few concept notes from IFAD’s investment pipeline, usually as part of the Country Strategy Opportunity Programmes (COSOPs), with a dedicated inclusive rural financial service component. When we work on the full project design, we document the theory of change and include in the logical framework objectively verifiable indicators and means of verification. We could then build-in replication impact research through the project. Of course, we would need to make sure that we have the human and financial resources available on the ground.

Overall, we considered the event to be a success. All seven of the replication teams presented their draft results and received comments on their work. They are all committed to incorporating the feedback they received at the event into their papers. The replication studies will be posted in 3ie’s Replication Paper Series later this year and are under consideration for a special issue. Keep your eyes out for this work in the near future!

Moving the debate forward on community-driven development

There is only one thing in the world worse than being talked about, and that is not being talked about (Oscar Wilde). Our recent review of community-driven development (CDD) is certainly being talked about. Sparked off by Duncan Green’s blog on our review, there has been an active debate about CDD on social media. But the message being taken away from our review is that ‘CDD doesn’t work’. We don’t say that exactly. CDD has been enormously successful in delivering small-scale infrastructure.

CDD funders and implementers should certainly be taking note of these discussions as they decide how to invest precious development money in designing programmes. So, we are keen to get our message straight.

Our evidence synthesis study is not a critique of CDD. The answers to the questions we examined are sometimes nuanced and sometimes straightforward. The question ‘what works?’ needs to be unpacked into ‘what works for whom and to achieve what, as well as why does it work or not work?’ These are not straightforward Yes or No questions, as we found in looking carefully at high-quality evidence on programmes from different countries.

Dismissing participatory development on the basis of CDD’s impact would also be a dangerously incorrect conclusion. We reviewed programmes that mainly focused on infrastructure because billions of dollars have been pumped into this approach in several countries. Community engagement programmes in specific sectors, such as school-based management, community-led total sanitation or self-help groups, have their own theories of change. We need to examine evidence of their effectiveness separately. 3ie has in fact a dedicated evidence programme looking at community engagement approaches for improving immunisation. A review of school-based management finds that it does have positive effects, though not in low-income settings. And a review of self-help groups finds they do lead to women’s economic, social and political empowerment.

What our synthesis finds

What is clear is that CDD programmes do work for building public infrastructure.  This is an unequivocal finding of our study. They have often constructed or rehabilitated very large numbers of facilities in communities and have benefitted from tens to hundreds of thousands of people in each country. In many cases, programmes have exceeded targets. However, its cost effectiveness compared to alternative approaches is not clearly established. We need to carry out further research in different contexts.

But proponents of CDD have claimed more than this impact. They claim that CDD builds communities, not just schools, roads and health clinics. A staggering finding from our work is that CDD programmes have no effect on social cohesion. Irrespective of the country or the type of programme (long or short-term), the lack of effect is consistent across contexts. This is where meta-analysis is so useful, as it clearly illustrates the finding in the forest plot (see graph). When the intervention line crosses the vertical line – as it does in all cases – it means there is no significant impact. We find the same lack of effect on improved governance.

We think that part of the problem between the claim and the evidence lies in an unclear theory of change for how CDD improves social cohesion. Social cohesion refers to behaviours and attitudes within a community that reflect members’ propensity to cooperate. If this is used as a working definition, then it is unclear how the inclusive process of community participation in implementing development projects automatically engenders further cooperation. As has been suggested in some studies, CDD programmes may well be users rather than producers of social capital.  For example, it is those communities where social capital is already high that come together to make a successful application for funding.

Moreover, our analysis using the funnel of attrition (see infographic) shows how participation in actual decision-making and project management ends up involving only a small number of community members.

It was also perhaps overly ambitious to believe that CDD projects could heal community divisions in post-conflict settings or build local democracy. The evidence is clear that they do not do so.

Where the studies do not give such a clear message is about CDD’s impact on social and economic welfare. Here again, CDD’s theory of change relies on assumptions that may not hold. As other commentators have said, the creation of public infrastructure will not improve health and education outcomes without complementary inputs. So, while most CDD programmes did not improve children’s school enrolment, the Peruvian FONCODES did simply because a centrally managed school uniform and school feeding programme was implemented along with school construction and rehabilitation. Similarly, in very exceptional cases, health outcomes have improved because construction or rehabilitation of facilities is accompanied by investment in health staff, medicines and other supplies.

Some commentators on our study have cited another review by Casey, which does find positive effects on economic outcomes. Casey’s analysis includes fewer studies than our review. But this is not a question of ‘our study is right, your study is wrong’. This points to the need to unpack the evidence further to ask which CDD programme designs may have these positive effects and in which contexts.

Understanding programme variance through a mixed-method review

When we started working on this review, we quickly realised that impact evaluations could not give us all the answers we were looking for (as some commentators have emphasised). We drew on process evaluations and qualitative research to understand how different programme design elements have worked. Here are some things that stood out for us:

  • CDD implementing agencies have used a variety of measures to promote the participation of women, poor and marginalised groups. They have targeted the poorest communities, mandated quotas in project committees and provided facilitation. While impact evaluations provided no information on how these measures worked, project documents have been a rich source of information. For instance, long running programmes such as the Kecamatan Development Programme (KDP) in Indonesia have a comprehensive set of successful measures to improve women’s participation compared to several short term programmes that either had limited or one-off measures.
  • The institutional set-up of the CDD implementer, as an independent agency or as part of an existing ministry or department, influences the impact of the programme in different ways.
  • Evaluations show that communities face numerous challenges in maintaining infrastructure. Implementers should pay explicit attention to the technical, institutional and financial mechanisms in place for ensuring that infrastructure are maintained and operate properly. Again, long-running programmes such as KALAHI–CIDSS in the Philippines have used good practices that are worth learning from.

Our bottom line

Our bottom line is that CDD has worked to deliver infrastructure and might be the most cost-effective approach, though that needs testing in different contexts. Agencies should however stop claiming that CDD also builds social cohesion. More effort is required to understand participatory approaches, particularly through learning from where it has been successful.

We urge CDD implementers and evaluators to move beyond the definition of a community as a geographic administrative unit. There should be more attention paid to local political economy and gendered power dynamics in communities. For CDD to be truly development-driven by communities we need to work towards taking a different approach.

More resources: Read the full working paper on community-driven development or download the brief for a summary of the main findings. You can also visit our page to download the key infographics and watch this short explainer video on the review.

Learning power lessons: verifying the viability of impact evaluations

Learning from one’s past mistakes is a sign of maturity. Given that metric, 3ie is growing up. We now require pilot research before funding most full impact evaluation studies. Our pilot studies requirement was developed to address a number of issues, including assessing whether there is sufficient intervention uptake, identifying or verifying whether the expected or detectable effect is reasonable and determining the similarity of participants within clusters. All of these inter-related issues have the same origin, in a problem 3ie recognized a few years ago (highlighted in this blog), which is that eager impact evaluators frequently jump into studies with incomplete information about the interventions they are evaluating. While the evaluation question may seem reasonable and justified to everyone involved, inadequate background information can cause miscalculations that render years of work and hundreds of thousands of dollars meaningless. Low intervention uptake levels, unrealistic expected or detectable effects, unexpected effect sizes or overly low intracluster correlation coefficients (ICCs) may result in insufficiently powered research and thus waste valuable evaluation resources.

Increasing the accuracy of evaluation assumptions matters. Insufficient power can torpedo an evaluation, which is the motivation behind 3ie’s proposal preparation pilot phase grants. By providing a small grant to demonstrate the viability of an evaluation, we aim to maximize the effectiveness of our limited resources to fund answerable evaluation questions. The formative research grants provide evidence of adequate intervention uptake while validating the accuracy of power calculation assumptions. We reached this point by learning from previous power assumption missteps.

Lesson one: intervention uptake

Our first power lesson revolves around unreasonable intervention uptake assumptions. Low uptake may arise from several factors, including but not limited to the design of the intervention with insufficient knowledge of local culture or a low level of demand among the intended intervention beneficiaries. Low uptake immediately endangers the usefulness of proposed evaluations, as insufficient sample sizes may not allow the researchers to detect a significant (or a null) change in treatment recipients. Pilot studies help to validate the expected uptake of interventions, and thus enable correct calculation of sample size while demonstrating the viability of the proposed intervention.

One example of an evaluation with an incorrect uptake assumption occurred in a 3ie-funded evaluation in 2015. The intervention used a cadre of community health workers to deliver HIV medicines (antiretrovirals) to “stable” patients. These are patients who had been on treatment for at least six months and had test results that indicated that the virus was under control.  During study enrollment, the evaluators realized that many fewer patients qualified as “stable” than they had anticipated. In addition, it was taking a lot longer to get test results confirming eligibility. These two challenges resulted in much slower and lower enrollment than expected. In the end, the researchers needed three extensions, expansion to two additional districts and nearly US$165,000 in additional funding to complete their study in a way that allowed them to evaluate the effects they hypothesized.

Lesson two: expected effect sizes

Our second lesson stems from poorly rationalized expected changes in outcomes of interest. Many researchers, policymakers and implementers expect interventions will result in substantial positive changes in various outcomes for the recipients of the intervention. Compounding this potential error is that many researchers then use this optimistic assessment as their “minimal detectable effect.” However, unrealistic expectations used to power studies will likely lead to underpowered sample sizes. Studies require larger sample sizes to detect smaller changes in the outcome of interest. By groundtruthing the expected effectiveness of an intervention, researchers can both recalculate their sample size requirements and confirm with policymakers the intervention’s potential impact.

Knowing what will be “useful” to a policymaker should also inform how researchers design and power an evaluation. Policymakers have little use in knowing that an intervention caused a 5 percentage point increase in an outcome if that change is not clinically relevant or sufficiently large to save the government money or large enough to create a meaningful difference for the beneficiary. At the same time, if a 10 percentage point increase would make the policymaker very excited to expand a program, but the implementers or researchers “hope” that the intervention will result in a 25 percentage point increase, powering the study to detect 25 percentage points may be a fatal error. If the intervention “only” produces a 20 percentage point increase, the study will be underpowered, and the evaluators will likely not be able to detect statistically significant changes. Sometimes the researchers’ best choice is to conservatively power their study, allowing for the greatest likelihood of detecting smaller, but still policy relevant, levels of impact.

An example of unrealistic expected effects comes from a 3ie-funded evaluation of how cash transfers influence livelihoods and conservation in Sierra Leone (final report available here). The researchers designed their randomized controlled trial to measure both the influence of earned versus windfall aid and the effect of community versus individual aid. The researchers note that the implementing agency expected the different aid interventions to cause large changes in economic, social and conservation outcomes. Unfortunately, after visiting the intervention and control areas six times over three years, the researchers were unable to detect any consistent statistically significant impacts on changes in these outcomes compared to control or each other. While they estimated some differences, they argue the effects were not significant due to being underpowered for their analysis.

Lesson three: outcome intracluster correlation coefficients

Our third lesson focuses on ICCs. Many researchers assume ICCs, either based on previous studies (that oftentimes assumed the inter-relatedness of their samples), or based on a supposed “rule of thumb” for an ICC that does not exist. Time and place may cause variations in ICCs. Underestimating one’s ICC may lead to underpowered research, as high ICCs require larger sample sizes to account for the similarity of the research sample clusters.

Of all of the evaluation design problems, an incomplete understanding of ICCs may be the most frustrating. This is a problem that does not have to persist. Instead of relying on assumed ICCs or ICCs for effects that are only tangentially related to the outcomes of interest for the proposed study, current impact evaluation researchers could simply report the ICCs from their research. The more documented ICCs in the literature, the less researchers would need to rely on assumptions or mismatched estimates, and the less likelihood of discovering a study is underpowered because of insufficient sample size.

Recently 3ie funded a proposal preparation grant for an evaluation of a Nepalese education training intervention. In their proposal, the researchers, based on previous evaluations, assumed an ICC of 0.2. After winning the award, the researchers delved deeper into two nationally representative educational outcomes datasets. Based on that research, they calculated a revised ICC of 0.6. This tripling of the ICC ultimately forced them to remove one intervention arm to ensure they had a large enough sample size to measure the effect of the main intervention of interest. This is a good example of the usefulness of 3ie’s new approach to most evaluation studies, as the researchers’ sample size recalculations gave them the greatest likelihood of properly measuring the effectiveness of this education intervention.

Without accurate assumptions, researchers lose their time, donors lose their money, stakeholders lose interest and policymakers’ questions remain unanswered. Power calculations and the subsequent sample size requirements underlie all impact evaluations. The evaluation community has the power to correct many of these miscalculations. Please join us in raising the expectations for impact evaluation research by holding proposals to a higher bar. We can all do better.

More resources:  3ie published a working paper, Power calculation for causal inference in social science: sample size and minimum detectable effect determination, that draws from real world examples to offer pointers for calculating statistical power for individual and cluster randomised controlled trials. The manual is accompanied by the Sample size and minimum detectable effect calculator©, a free online tool that allows users to work directly with each of the formulae presented in the paper.

This blog is a part of a special series to commemorate 3ie’s 10th anniversary. In these blogs, our current and former staff members reflect on the lessons learned in the last decade working on some of 3ie’s major programmes.