Monthly Archives: January 2014

When is an error not an error?

flickr_nickwebb_3016498475Thomas Herndon, Michael Ash, and Robert Pollin (HAP) in their now famous replication study of Reinhart and Rogoff’s (R&R) seminal article on public debt and economic growth use the word “error” 45 times. The study sparked a tense debate, summarized by the Financial Times (FT) between HAP and R&R about which differences in HAP’s analysis really point to errors in R&R’s original work. At 3ie, we are more than a year into our replication programme, and we are seeing a similar propensity for replication researchers to use the word “error” (or “mistake” or “wrong”) and for this language to cause contentious discussions between the original authors and replication researchers. The lesson we are learning is:

To err is human, but to use the word “error” in a replication study is usually not divine.

Some would ask, isn’t that the point of internal replication? Yes. As we argue in our forthcoming paper, one of the four reasons why internal replication is important for validating evidence is because “to err is human”. Original authors do occasionally make mistakes and correcting them is major benefit of replication.

So what’s the problem? The problem is that pure replication of an original author’s empirical analysis is often really complicated, not to mention time consuming. And what we’re seeing is that even relatively successful pure replications end up with many estimates that are just not quite the same as in the original article. Replication researchers are often quick to call these “errors”. But if two people conduct the same analysis on the same data, and they each get similar but not identical estimates, who is to say what is right and what is wrong?

Not surprisingly, the word “error” makes original authors defensive and leads to debate. But two sides arguing about a small difference in a point estimate does not help us achieve the objective of finding the best evidence for policy making and program design. To suggest that a small difference that happens to be around an arbitrary cut-off should change policy conclusions is to fall prey to the “cult of statistical significance”. Whether in the original paper or in the replication study, we should focus instead on what is relevant and robust. As Pollin concedes in the FT interview, the real question is whether a conclusion is robust.

So when is an error truly an error? We submit that the word “error” only be used in replication studies when the replication researcher can identify the source of the mistake. The HAP replication study does point to some clear errors. For example, the original authors missed five rows of data in their estimations using their excel file. That was an error that was acknowledged by the original authors here and here.

When there are discrepancies in the estimates that cannot be explained, we recommend that replication researchers use the words discrepancy or inconsistency. We are not suggesting that discrepancies are not important. They are. A large number of discrepancies in the pure replication that cannot be explained by the original authors or by the replication researchers may call into question how well the underlying datasets are coded, labeled, documented, and stored. And that should call into question the quality of the analysis that can be conducted with those data. One objective of the 3ie replication programme is to motivate authors to document and maintain their data more carefully. But unexplained discrepancies are not necessarily errors.

An error is also not an error if it results from a different decision made in the measurement or estimation analyses. Many researchers hold strong beliefs about which methods are appropriate and how they should be used. Sometimes what is right is pretty cut and dried. You need to use clustered standard errors when you have a cluster design. But often those choices are more discretionary. Jed Friedman’s blog post on linear probability models (LPM) versus probits and logits describes his debate with a referee about whether it is “wrong” to use LPM in the case of binary responses. Friedman quotes Jörn-Steffen Pischke on the matter: “the fact that we have a probit, a logit, and the LPM is just a statement to the fact that we don’t know what the ‘right’ model is.”

Certainly a replication researcher should critically examine the methodological choices made by the original authors. The existence of multiple possible models should motivate a careful discussion of the underlying assumptions as well as provide an opportunity to test the original paper’s result for robustness to model choice. Arguments about measurement and estimation are particularly important when the main conclusions from the study hinge on those choices. In the Financial Times interview, Pollin makes the more relevant critique of R&R that “their results are entirely dependent on using that particular methodology.” This statement, and not the 45 uses of the word error, is a more divine approach to replication research.


This blog post first appeared on World Bank’s Development Impact blog.

Civil society: strong advocates for evidence-informed HIV/AIDS policies and action

flickr_IMF_4412933925As a member of 3ie’s HIV/AIDS programme team, I attended the annual International Conference on AIDS and STIs in Africa (ICASA), where I was struck by the strong and vital presence of countless civil society organisations (CSOs). From the displays in the main lobby calling for sex workers’ rights to the exhibits displayed by legal networks, human rights advocates and community organizations throughout the Cape Town International Convention Centre, I was reminded that participation of CSOs is crucial to moving forward the dialogue and action on improved HIV prevention and access to HIV/AIDS treatment, especially in hard-hit developing countries.

One highlight was how much recognition CSOs got for their important contributions to getting responsive and effective policies and practices at local, national, regional and global levels. They have been crucial in educating the public, providing services, and informing policymakers and stakeholders about what works well and what doesn’t and for maintaining demand for more and better responses to the epidemic. “Evidence shows programmes do better in terms of impact when civil society is involved,” proclaimed a panelist representing the CSO perspective in a session about the Global Fund’s new funding model.

Indeed, CSOs have been key actors in advocating for evidence-informed policies that address many of the critical issues in the HIV/AIDS sector. This is especially true for programmes and policies targeting key populations, such as persons with disabilities (PWDs), women and people infected with tuberculosis.   In this context, CSOs have been increasingly effective in presenting evidence to policymakers that sheds light on the high risk of HIV/AIDS and other sexually transmitted infections (STIs) that PWDs face, given their higher vulnerability to sexual violence. The absence of discussion about this issue, and the limited resources for PWDs on sexual health education and sexuality, have been cited as factors contributing to the increased risk of PWDs acquiring HIV. USAID and other donors have responded with an increasing focus on policies addressing their needs. USAID requires that all of their projects integrate PWDs in their scope of work. Major international HIV/AIDS conferences are also taking the necessary steps to be a more inclusive forum, which was evident at ICASA conference sessions and accessible public spaces offering networking opportunities for PWDs.

Increased evidence about the social and structural drivers that contribute to women’s and girls’  increased risk of HIV, such as gender-based violence and inequality has helped make addressing these problems a key issue on the HIV/AIDS agenda. Here again, CSOs have made an important contribution for ensuring the integration of gender into national HIV/AIDS strategic plans and health policies.

CSOs have been playing an important role in the Global Fund since its inception, thanks in part to having broad representation and rules for CSO representation in country coordinating mechanisms. Advocates for the integration of tuberculosis into HIV/AIDS programming have also made significant strides. The Global Fund is now accepting only one application for both HIV and tuberculosis programmes as their experience has demonstrated that having two separate parallel processes for programmes is not as effective as having them integrated. Their recently unveiled new funding model also requires that CSOs be involved in country-level dialogues. Organisations receiving Global Fund assistance also have to be transparent and document the process of CSO representation and participation.

HIV/AIDS CSOs bring passionate commitment and often some much needed tenacity and aggravation to their advocacy. That energy is part of what makes ICASA so worthwhile and inspiring for me.  Translating research evidence effectively into policy and practice is a complex, difficult and often longer term endeavor. ICASA reminded me that CSOs are important policy actors and partners. Their involvement, and the communities they represent, is central to achieving an AIDS-free generation.

Making participation count

flickr_CGIAR climate 8551429350Toilets get converted into temples, and schools are used as cattle sheds. These are stories that are part of development lore. They illustrate the poor participation of ‘beneficiaries’ in well-intentioned development programmes.

So, it is rather disturbing that millions of dollars are spent on development programmes with low participation, when we have evidence that participation matters for impact. But many impact evaluations do not analyse participation at all. Often, they fail to report data on take-up rates or even if they do, they don’t look at why there is low participation.

Evaluators often do not clearly distinguish between the impact on those who actually took part in the programme (treatment effects on the treated) and impacts on the entire population targeted by the intervention (intention-to-treat effect). According to Aidgrade, only 45 per cent of impact evaluations clearly distinguish between intention-to-treat effects and treatment effects on the treated.

For a policymaker, this distinction is crucial to deciding whether to scale up. A programme with high intention-to-treat effects is worth expanding. But when positive treatment effects on the treated are accompanied by low participation, the programme cannot be considered a success.

Voluntary male medical circumcision (VMMC), for instance, is widely advocated as an efficacious clinical measure to reduce the risk of contracting HIV. But there is evidence to show that the demand for circumcision is quite low. According to a 3ie scoping study, fear of pain and complications during and after the surgery, concern about long healing periods, and financial and opportunity costs are major barriers responsible for the low demand for it. It is clear that VMMC will not significantly impact HIV rates, if we don’t carefully look at the facilitators and barriers influencing a man’s decision to get circumcised.

So, why do poor people not participate in development programmes that presumably have clear benefits for them? Do they not know about the programme? Are they not aware of the benefits?  Are the socio-economic costs of participation excessive?  Do they not perceive the intervention as a benefit?

At 3ie, we are seeing several of our funded impact evaluations reporting low take-up of a programme. However, only a fraction of these evaluations actually delve into the question of why there is low participation. We are tackling this limitation by looking into what evaluators can be doing to unearth important answers to these questions. For starters, there is an urgent need for impact evaluation designs to adopt a more systematic approach to understanding the various dimensions of participation. Here are a few pointers that we offer when we discuss evaluation designs with our grantees.

Conduct mixed-methods impact evaluations

A 3ie-supported impact evaluation of an inventory credit and storage facility programme for palm oil producers in Sierra Leone showed that the intervention did not have an impact on either the storage or the sales of palm oil. Focus group discussions conducted as part of the evaluation revealed that many of the palm oil producers had never interacted with bank officials. A general lack of trust contributed to the low take-up of the intervention. Qualitative research in this case contributed to a richer understanding of why there was low participation. To analyse take-up, evaluators need to include qualitative methods.

Map out the assumptions underlying the theory of change and analyse them

Many 3ie-supported research teams illustrate a progamme theory of change using a flow chart. Unfortunately, while charting out the causal chain, they often do not consider the underlying assumptions.  These assumptions could include a whole host of structural and contextual factors that may be ignored.

For example, while implementing a women’s self-help group programme, it would be important to consider the possibility that information about the existence of the programme may not reach potential participants. It is also likely that the women may consider attendance in meetings to have a big opportunity cost. They may have to give up several hours of work on their farm or at home to attend a self-help group meeting. Assumptions about participation may also run counter to what a woman is able to do in her community.  Women may have to break gender related social norms in the community to attend this meeting unaccompanied by their spouse or male relative.

All these assumptions need to be tested and analysed in the evaluation. Evaluators can only analyse reasons for low take-up and the funnel of attrition when these assumptions are made explicit before the start of the evaluation.

Conduct formative research

Before conducting an impact evaluation, implementers and evaluators should assess the demand for an intervention. Formative research can help in understanding how demand works, particularly the local context that affects demand. If existing demand is low, formative research could also spark thinking on additional interventions for increasing take-up.

This is what happened in the case of a planned 3ie-supported impact evaluation of the Philippines Open High School Programme. Focus group discussions and in-depth interviews with students and teachers ahead of the impact evaluation threw up various reasons for the existing low take-up of this programme, for example, lack of information and the printing costs for study modules. All these useful insights prompted the evaluators to work with the Department of Education and design a factorial impact evaluation that could inform the design of the programme. The researchers are now considering a randomised controlled trial to test whether the provision of information about the programme is sufficient, or whether the additional subsidisation of printing costs for study modules would be required to spur participation.

It is particularly important to increase the participation of people who could benefit from an intervention. An impact evaluation of a vocational training programme in Nicaragua showed that those who self-select for the programme actually benefit less than relatively poorer people who do not participate in the vocational training due to low aspirations. So, in such cases, improved targeting is key to achieving improved take-up.

Final thoughts

Some of the more practical challenges related to the take-up of a programme may be easy to fix with small design tweaks like marketing tools for increasing awareness. But development problems tend to be complex, long standing, and deep. Addressing these problems involves making social, political and behavioral changes. We therefore require a fundamental shift in our thinking about the design and evaluation of programmes. Whatever the case may be, evaluators need to first analyse the drivers and blockers of demand. A good start is to look at participation systematically and to dig deeper for answers to the question of why participation levels were what they were.

Please sign up to receive email alerts when a new blog gets posted on Evidence Matters and feel free to leave a comment below.