Monthly Archives: May 2013

Do implementation challenges run impact evaluations into the ground?

6Economists use a variety of tools to understand impact caused by development programmes. Theories of change, qualitative analysis, quantitative techniques and advanced econometrics are all arrows in the quiver. But are these methods sufficient to ensure high quality impact evaluations?

To put it another way, is there a difference between a project designed solely for implementation and one that is prepped for an impact evaluation? And if not, does this compromise the external validity of impact evaluation results?

We discuss these questions, using some examples from the field.

Our first example comes from a field trip to the study site of a 3ie-supported impact evaluation in India. The programme being implemented was a conditional cash transfer project. The proposed design was a randomised controlled trial (RCT) with a pipeline approach. A randomly selected group of villages in a large Indian state was supposed to receive a debt-swapping intervention targeting self-help groups in the first year. The intervention provided low-interest rate loans to rural villagers in self-help groups, so that they could pay off their high-interest rate loans taken from local money lenders. The villages that served as a control group were supposed to receive this intervention later.

This was how it was planned on paper. During a field monitoring visit, when we spoke to one of the principal investigators of the project, who was also a member of the implementation team, we realised that he did not know what random allocation meant. He also did not know that the project needed to be implemented in batches (the pipeline approach). In fact, he proudly talked about the near-universal coverage of the debt-allocation programme. Worse, baseline data collection, that was supposed to have started two years ago, was still being planned.

All this while the lead research team, located in a far off western country, was designing survey instruments, survey roll-outs and planning for the evaluation.

Epilogue: The project design has since changed to include a randomly assigned ‘encouragement’*

to administrators to follow-up on implementing the debt-swapping intervention.

In the case of another 3ie-supported study, examining factors affecting institutional deliveries, the study team selected a survey firm which was commercially well-known for market research. In an exhaustive dataset, this survey firm very quickly provided a lot of data to the study team. But when the study team went down to the field with a research assistant, who was also trained in the vernacular, they found that the enumerators from the survey firm were not very skilled in survey techniques.

While collecting voice data, surveyors had been coaching their illiterate respondents to give the ‘right’ answer. Since the surveyors needed to reach their target numbers on a daily basis, they had little patience with their respondents. A senior researcher from the study team said, “After the question was asked, the surveyor would put the tape-recorder away. They would then gently tell the respondent how to respond and push the tape recorder towards the person.” Later when confronted with this obvious data tampering, the survey team told the senior researcher, “Tell us what you expect to see in the data, and we’ll show it to you.”

Epilogue: The study team has decided to drop the survey collection team.

18Our third example comes from a 3ie-supported study examining the impact of the use of mobile phones on savings accounts. Surveyors were scheduled to survey target groups, who had received mobile phones and an initial start-up amount, about their saving habits.

During the course of our field monitoring, we found out that the surveyors had been visiting households in their target group every month. And these regular visits carried on for several months. Not only did the surveyors end up knowing every member of the household they were visiting, but they also knew their savings habits. They also knew the ‘secret pin codes’ and chatted openly about the use of these savings accounts. On their part, households would wait for the surveyors to come and operate their savings accounts.

It was not difficult to see the reasons for the uptake of phone based savings accounts in the population. The ‘surveyor effect’ was easy to predict but interestingly the research team did not account for this. They saw the uptake as proof of the concept working.

Epilogue: A separate group has been identified to look at survey effects.

Our fourth and final example comes from an impact evaluation of agro-business packages intervention in Niger. One of us was part of the research team of this project. The programme provided inputs for production to agricultural cooperatives and measures for easing access to the market. It was predicted that the intervention would reduce costs, improve producers’ technical skills and consequently lead to an increase in production and exports. The proposed design was an RCT.
In this case, while the evaluation team was still developing baseline data collection tools, the implementing agency went on to deliver the intervention. The implementers implemented a demand-based intervention on a first-come-first-served basis. So, farmers who submitted their application earlier on would receive the agricultural subsidy package. The implementing agency’s monitoring and evaluation team did not grasp the fact that such a demand-based identification process of beneficiaries jeopardized the RCT.

Epilogue: A programme manager was appointed and the research team used an alternative design: propensity score matching.

Lessons learnt: too little too late? 

In each of the cases described here, a solution was found to address the problem at hand. But the big message we take away from these case studies is that there is an obvious disconnect between implementers and researchers.

To help bridge this gap, we think it critical to have a field manager liaise between the implementation team and the research team. The field manager should be carefully chosen. This field manager should know the operational intricacies of the project and the political pressures that it faces. But he/she should also understand the evaluation design and be well-versed in management of evaluations.

Our own epilogue: When we recommended the idea of having a research team on the ground to one of 3ie’s grantees, we got the following response: “We note that we are only responsible for the evaluation of the program…the implementing agency has its own monitoring system…”.

*Footnote: *In encouragement designs, participants are randomly encouraged to participate in the programme. This is usually done when randomization of accessing the programme or participating in the programme is not practical or desirable.

Waiting for Allah


I recently met with the acting chairperson of the Pakistan Earthquake Reconstruction and Rehabilitation Authority (ERRA). He described in proud detail ERRA’s successful reconstruction efforts after the disastrous 2005 earthquake that claimed more than 70,000 lives. In more muted tones, he described the 2010 floods, which claimed more lives than the 2005 earthquake. The rescue and relief effort was excellent in 2005. It failed miserably in 2010.

Pakistan experiences more than 110 earthquakes every year. This translates to at least 2-3 earthquakes every week. These often measure between 3.1 and 5.2 on the Richter scale. But then there are black swan events like the one in 2005, which measured 7.6 on the scale. What is surprising is not that the 2005 event took so many lives but that low grade earthquakes, in comparison, affect more lives every year. In the past three years, more than 60 million people have been affected by these low magnitude earthquakes in Pakistan (more than one-third of its total population).

In any humanitarian disaster, damage is a not just a matter of the intensity and magnitude of the disaster but also of the vulnerability of people. ERRA recognized this fact. Like many other disaster management authorities around the world, ERRA has expanded its mandate beyond short term relief to incorporate the recovery effort and on building long term resilience into their mandate, not just short term relief.

But another problem assails ERRA’s mandate, perhaps of a greater magnitude since it threatens its existence. ERRA finds it hard to explain to its constituency and stakeholders outside just what it is that they do. They find it difficult to explain why one effort failed, why another one succeeded, or indeed why it’s important that ERRA be provided with adequate authority, funds and mandate to scale up their activities and programmes. Any statement they make themselves suffers from the obvious bias – it’s self-serving.

With 15,000 project sites organized by clusters, and a mandate to coordinate over 650 NGOs, the agency is ripe for an independent impact evaluation. An impact evaluation would answer several key questions. What is the impact of ERRA’s disaster risk reduction initiatives? What would have been the cost of damage which has been prevented, and what is the impact of resilience efforts? And finally, a key question – is ERRA truly building back better? The answers to these questions are important not just for the agency’s sustainability but also for the government of Pakistan.

Even though disaster relief and humanitarian assistance agencies claim that their work is done after they provide immediate emergency relief, most agencies stay on for the long term. The International Federation of Red Cross and Red Crescent Societies in Haiti, Médecins Sans Frontières in Somalia, Oxfam in Sudan, Action Against Hunger/ ACF International in Chad (the list is endless) have stayed on in the country for over a year after the disaster. Disaster relief goes much beyond immediate relief. It moves on to help in reconstruction, in recovery and finally in resilience. The impact of agencies cannot be measured by looking at just the number of vaccines delivered, the number of shelters constructed, and the number of plump nut packages delivered. It’s measured by the number of people made resilient, and by damage avoided.

In 2011, 62 million people were affected by humanitarian crises around the world. The international community raised $17.1 billion in response. But one-third of the needs of the people were not addressed (United Nations). Today several man-made and natural disasters enervate already impoverished populations around the world. When lives are in danger, and the demand for resources is overwhelmingly high, evidence regarding efficient delivery of services and effective programmes becomes even more critical. The time to measure the impact of humanitarian assistance is well upon us.

Note: With apologies to Christina Lamb, whose book title I borrow unashamedly from (it’s also an excellent book).

(In a subsequent blog, Jyotsna will visit the methodological difficulties in conducting impact evaluations of humanitarian assistance programmes and how these may be overcome.)

Is impact evaluation of any use to ‘project beneficiaries’?


Impact evaluation might be seen as a prime example of what leading participatory proponent Robert Chambers has called extractive research. Researchers go out to communities, collect data, scurry back to their ivory towers to analyse and publish their findings. They build up their career and reputation, and then move onto the next study. In doing all of this, the research subjects have received nothing. They don’t even know the research ‘findings’.

But it doesn’t have to be this way. Unlike typical ‘academic’ research, ‘empowering’ research is used by communities to improve their situation. Two recent books illustrate some of the ways in which impact evaluation and participatory methods can engage with each other. Impact evaluation can be empowering and not extractive.

Quantitative research in general has traditionally relied on closed survey instruments. Whilst this situation is changing with the use of vignettes and behavioural experiments, participatory methods have not been added to this tool kit. But, as a new book edited by Jeremy Holland shows, participatory methods can generate numbers, which can also be used in quantitative analysis. In Who Counts? The Power of Participatory Statistics, there are examples of how community-level data was used to assess livelihood outcomes, as well as process issues (e.g. Why wasn’t an agricultural starter pack in Malawi used? Because the seed was rotten, the pack came late and farmers didn’t have the necessary tools to use it). As the book demonstrates, participatory methods can in many cases yield more accurate responses than asking community leaders or individual households about a whole range of issues, from cropping patterns to power relations.

Of course, caution does need to be exercised. Participatory statistics, like other statistics, should be representative. In the provocatively titled article ‘Why were so many social scientists wrong about the Green Revolution?’, Alastair Orr points to the reliance on a small number of village studies for building a case to show that the technologies increased inequality and could have been impoverishing. But these studies turned out to not be representative of what was happening in the country as a whole. The critics were finally silenced by nationally representative household surveys which showed that living standards were rising.

But are research findings shared and communicated? Researchers in general (and research funders) have paid little attention to ensuring that findings are shared in the country in which the research was conducted, let alone back in the slums and villages where the ‘research subjects’ live. Again, it doesn’t have to be this way. The study team of a 3ie funded impact evaluation on pre-schools in rural Mozambique reached over 2,000 people through 30 presentations in the districts where the research had taken place.

Impact evaluation research should ideally feed into the wider evidence base to inform decision making. And these are decisions that can be made by communities as well as cabinets. Tim Magee’s excellent Field Guide to Community Based Adaptation lays out a framework to facilitate local level participatory evidence-based development. Participatory methods are used to identify local priorities. Then, global evidence from systematic reviews is fed into discussions to pick projects. So for example, if a community is concerned about child diarrhoea, then systematic reviews can help with picking an appropriate intervention. A review of high quality studies shows hygiene education to be an effective intervention to tackle the problem of child diarrhoea.

Researchers and funders need to address the moral issue of how those being researched will benefit from the research. Drawing on participatory methods and sharing research findings with communities can make research empowering rather than extractive.

What is true impact?

H3Earlier this year I had a knee operation. So I turned to the Cochrane Library to find evidence on whether post-operative physiotherapy really works. What I found was a systematic review, which showed that post-operative physiotherapy has too little benefit to be worthwhile. Those assigned a course of physiotherapy had hardly any greater joint flexibility one year later than those who were not.

Sure enough, after the operation, my doctor told me ‘now, you have to do these exercises’. I was asked to do three knee exercises, requiring me to lie on my back, sit on tables and so on. Each exercise had to be repeated twenty times.  And this set of exercises is to be repeated six times a day.

But I have a job. I don’t have time to spend all day faffing around with my knee! I dare say many other people feel the same. We know that people are generally pretty bad at taking their medicine, doing their exercises and so on. So the systematic review in the end didn’t really tell me what I want to know. Were the exercises ineffective amongst those who actually do them, or amongst those who were assigned exercises but may or may not have done them? I wanted to know about efficacy rather than effectiveness. But that is unusual. Normally we are more interested in effectiveness.

This distinction between efficacy and effectiveness is well understood in clinical trials. It is less understood in international development circles, though it is every bit as important.  An efficacy trial tests the impact of a treatment if it is taken exactly as planned under ideal circumstances. An effectiveness trial measures the impact of a treatment ‘in the real world’. In the real world there are other things going on which usually reduce the impact.

Small-scale pilots subject to an impact evaluation are like efficacy trials, with the researchers overseeing implementation. But once the programme is taken to scale it will be difficult to maintain the same quality of implementation. And so the impact will be less. We cannot therefore take the findings from a small study as a basis for going to scale nationally, let alone internationally. We have to evaluate the programme at scale. We have to measure effectiveness as well as efficacy. And indeed, a programme appeared to work when it was implemented by an NGO in one district in Kenya. But a recent study shows that when the Kenyan government took the same programme to scale, it no longer had an impact.

A key difference between effectiveness and efficacy is that people just don’t comply with the treatment. I suspect this difference explains the results of the study for post-operative physiotherapy. 3ie-funded studies have found low rates of compliance for a wide range of interventions, from free medical male circumcision to subsidised pre-schools.

We had a bit of a disagreement with our grantees recently about how to handle these low compliance rates. Just over 20 per cent of households had taken part in a programme. Under these circumstances working out what we call the ‘treatment on the treated’ effect, that is the impact just on those households taking part, can be a bit tricky (in clinical trials, this is called ‘on protocol’). What we can measure more easily is the ‘intention to treat effect’, that is the average effect measured across all households, whether they took part or not. The grantee wanted to be able to measure what they called the ‘true impact’ of the programme, that is the impact just on those taking part. But, we said, the truth is nearly 80 per cent of households don’t take part. And that is indeed the truth confronting any policymaker taking this programme to scale. So, the impact that will be had at scale is measured by the intention to treat effect (at best, as implementation may be worse).

I have just finished Ben Goldacre’s excellent Bad Pharma on the uses and abuses of clinical trials, where he discusses precisely this point. He calls reporting the treatment of treated effect as the programme impact ‘utterly bizarre’. It is used as one of the many tricks to over-estimate how effective drugs are. He also points out that, once a drug is approved, the effectiveness study is rarely done, even if the regulator has requested it.

So, evidence-based development has a chance to move a step ahead of evidence-based medicine, if it focusses on effectiveness. Let us take that chance by making sure that rigorous impact evaluation is included in programmes being taken to scale on the basis of evidence from smaller pilots.

When will the press ever learn?

H4My last blog discussed the possible link between the dramatic reductions in crime in the UK and the adoption of evidence-based policing. But the Sunday Times recently ran a front page story which says that rather than solving crime, police are ignoring 850,000 cases a year which go uninvestigated.

But is ignoring all these cases a bad thing? Rather than the scandal the paper tries to make these uninvestigated cases out to be, this crime-solving approach is yet another example of using data to guide decision making. A simplified version of why this is so is as follows. Out of 100 crimes you can choose to investigate them all, and solve 20 per cent, so 20 crimes out of 100. Or you can take a quick look and decide that for these 40 we really have no witnesses, no other evidence, and it is a crime for which we have a low success rate. So, let’s devote our resources to the other 60. With this focus on investigations which have a better chance of succeeding, and the freeing up of additional resources to work on solving those crimes, 50 per cent are solved. So, 30 out of the 100 crimes are solved. Which is better?

One Christmas Eve my car was broken into and all the presents I had bought stolen. The police didn’t investigate the crime. It was frustrating. But I entirely see the logic of devoting resources to where they will be well used. This is better than wasting them on activities we have good reason to believe will not yield results. Unfortunately, the press don’t see it the same way.

Crime is not the only area where you see this sort of irresponsible reporting. Almost every month there is a press story criticising NICE, which stands for the National Institute for Clinical Excellence. NICE’s job is to decide which medical treatments represent good value for money and are made available through the National Health Service. NICE has to choose which treatments are most cost effective. Since money is limited, treatments which are expensive, or the evidence shows to have limited effects cannot be approved. The papers rail against lives being lost as NICE won’t approve a treatment available on the market. But resources are not infinite; these choices have to be made.

Rather than play a responsible role in educating the public, the media work up lobby groups and the public. Going back to the crime story again, the Sunday Times got Victim Support to speak out against the practice. If it really wants to help victims of crime, Victim Support should be supporting precisely those practices which increase the police’s success rate in solving crimes.

It would be great if the general public, and indeed politicians, appreciated the need for evidence-based policies. But as long as the media continue to argue for policies based on anecdotes and unrealistic expectations, such a change in attitudes will be difficult to achieve. Researchers – and 3ie – need to engage with lobby groups and the press to lay the foundation for better informed decision making.