Monthly Archives: August 2012

Can we do small n impact evaluations?


3ie was set up to fill ‘the evaluation gap’, the lack of evidence about ‘what works in development’. Our founding document stated that 3ie will be issues-led, not methods led, seeking the best available method to answer the evaluation question at hand. We have remained true to this vision in that we have already funded close to 100 studies in over 30 countries around the world. And we strongly promote mixed methods, in which the attribution analysis of ‘what works’ is embedded in a larger evaluation framework combining process and impact evaluation, factual and counterfactual analysis. This helps unpack the causal chain to understand why an intervention works or not, or only works for certain people in certain places.

Although we promote mixed methods, 3ie only funds studies which have at their core a large n impact evaluation, that is an experimental or quasi-experimental design with sufficient units of assignment to attain the statistical power necessary to use such a design. But many development interventions, such as the support of policy reform at the national level, or capacity building in a single agency, or indeed the assessment of whether a particular impact evaluation has influenced policy, are small n questions. That is, there are insufficient units of assignment to conduct statistical analysis of what difference the intervention has made.

So how do we do small n impact evaluation?  While there is no shortage of proposed approaches, we don’t have a consensus. 3ie has processed over 700 proposals to conduct large n impact evaluations. The more than 200 external reviewers we have consulted to screen these proposals are largely in agreement as to what they are looking for: a credible identification strategy, sufficient power and so on. But if we were to put ten evaluators in a room to screen proposals for small n studies it is doubtful they would agree which were the best designs, or even what constitutes a good design.

Despite this lack of consensus, I had a strong feeling that there was in fact a common core to the bewildering array of competing methods, such as realist evaluation, general elimination methodology, process tracing, and contribution analysis.  Together with Daniel Phillips, also at 3ie, I undertook a journey to the centre of small n methods. And to an extent we found what we were looking for. The four approaches just mentioned all stress the importance of clearly defining the intervention being evaluated and the underlying theory of change.  Changes in outcomes of interest should be documented, along with other plausible external explanations for observed changes in these outcomes. These pieces of evidence are assembled to establish plausible association.

All this was well and good, but we were missing something.  I kept coming back to the question of “but what constitutes credible causal evidence?”  There was no explicit answer to this question.  At best we are told “to use mixed methods”, which is not much more informative than saying “we will use methods”. In seeking an answer to this question I got drawn into cognitive psychology in which the study of attribution, or more precisely people’s ability to correctly ascertain causality, is a separate field of study. And the news is not good. Basically people are not very good at assessing attribution, with the ‘fundamental error of attribution’ being the centre piece of the literature. The fundamental error is that we more readily identify people as the cause of change rather than underlying circumstances, hence creating a bias to overstate the role of interventions rather than underlying social trends. An exception is the self-serving bias: when things go right we take the credit, but when they go wrong other factors are to blame. World Bank assessments of adjustment lending (good macro performance = policies work, bad macro performance = the government didn’t carry out reform properly) are an example of this bias.

The biases go on and on. Our paper ‘Addressing attribution of cause and effect in small n impact evaluations: towards an integrated framework’ has the full list of these biases. But, in summary, potential biases can arise when collecting qualitative data, in deciding which questions are asked, in what order, how they are asked and how the replies of the respondents are recorded. There can also be biases in how the responses are interpreted and analyzed, and finally which results are chosen for presentation. Of course quantitative data and analysis is also prone to bias, such as sampling bias and selection bias. But methodologies have been developed to explicitly deal with these biases. Indeed evaluation designs are judged on precisely how well they deal with these biases.

We need to be issues-led, not methods led. But we need stronger methods, and agreement on those methods, in order to be able to judge small n interventions with confidence.

Read 3ie Working Paper Addressing attribution of cause and effect in small n impact evaluations: towards an integrated framework.

Exercising credibility: why a theory of change matters


Recently the Chris Evans breakfast show on UK’s Radio 2 picked up a news story on a Danish study reporting that half an hour’s exercise a day is better for you than one hour.   Like me, the radio presenters were puzzled by this finding and wanted to know more.

Middle aged men of reasonable fitness were randomly assigned to two groups, one doing half an hour a day and the other a full hour. After three months the group exercising less had lost more weight.  The half hour exercise group lost eight pounds compared to six pounds for the one hour group; about one kilo more weight loss.

The study was widely covered in the press; partly because it has a message most people would like to hear, but also for the novelty value of counterintuitive results. Indeed, the researcher himself is at a loss to explain why this should be so.

Perhaps, he says, the half hour group weren’t tired after half an hour. Hence they exercised some more and so they didn’t actually exercise less!  Or perhaps they exercised much harder in that short time. But some simple calculations of plausible differences in exercise intensity show it’s not possible to burn more calories in half an hour than what would be burned in one hour.  Maybe, he added, those exercising more felt the need to eat more, and over-compensated the calories loss. This last explanation is plausible: a sports drink and chocolate bar wipes the calorie loss from about half an hour of exercise. And a listener to the programme pointed out that those exercising more may have built more muscle, which weighs more than fat.

To an outsider the real puzzle here is that the study wasn’t set up to be able to explain its findings.

The strength of randomized control trials (RCTS), like this study, is their ability to establish causal relationships between the intervention and the outcome. But we need factual analysis of what happened, to help complement the counterfactual analysis of causality. In the case of this study, participants should have been asked to keep food and exercise diaries. (Though doing so raises another often ignored problem in research that ‘the measurement is a treatment’.  Keeping diaries often reduces food intake and increases exercise volume).

Similarly many randomized control trials in international development fail to pay adequate attention to collecting data around the intervention’s theory of change. And so the authors resort to non-evidence based speculation to explain their findings. But without such explanation the appropriate policy response to study findings is often unclear.

The Danish researcher plans to go on to study exercise and commuting. But as one of the radio presenters exclaimed ‘he can’t do that, he hasn’t finished this research yet’. The presenter is right. RCTs which just link the intervention to outcomes are unfinished research projects.  Incredible findings won’t become credible without a plausible, evidence-based explanation.

The UK Department for International Development recently released a new paper on the use of theories of change in international development.  3ie’s paper on theory-based impact evaluation also offers guidelines for mapping out the causal chain of an intervention.  Developing a theory of change helps identify the evaluation questions and can very often increase a study’s policy relevance.