Impact evaluation evidence continues to accumulate, and policymakers need to understand the range of evidence, not just individual studies. Across all sectors of international development, systematic reviews and meta-analysis (the statistical analysis used in many systematic reviews) are increasingly used to synthesise the evidence on the effects of programmes. These reviews aim to identify all available impact evaluations on a particular topic, critically appraise studies, extract detailed data on interventions, contexts, and results, and then synthesise these data to identify generalisable and context-specific findings about the effects of interventions. (We’ve both worked on this, see here and here.)
But as anyone who has ever attempted to do a systematic review will know, getting key information from included studies can often be like looking for a needle in a haystack. Sometimes this is because the information is simply not provided, and other times it is because of unclear reporting. As a result, researchers spend a long time trying to get the necessary data, often contacting authors to request more details. Often the authors themselves have trouble tracking down some additional statistic from a study they wrote years ago. In some cases, study results can simply not be included in reviews because of a lack of information.
This is of course a waste of resources: funds are spent on studies where the results are unusable, systematic reviewers waste time chasing standard deviations or programme descriptions, and the results of systematic reviews end up being less useful. This waste can be avoided easily by better reporting.
In this post we summarise the information researchers need to report for impact evaluations to be more useful and easily included in a systematic review of intervention effects.
The numbers we need to use your results most effectively in a systematic review
Studies typically use different scales and measures to assess the same or similar outcome constructs. This makes it difficult to combine and compare results across studies, which is one of the objectives of systematic reviews. Therefore, a key step in a systematic review is to convert results from individual studies into a common metric – a standardised effect size.
This is essential for meta-analysis, but even systematic reviews that don’t use meta-analysis benefit from more easily comparable effect sizes. That said, standardised effect sizes aren’t automatically comparable, either due to differences in underlying populations – discussed here – or in education evaluations, due to differences in test make-up – discussed here. But they certainly merit use with discretion.
To help systematic review authors calculate a standardised effect size researchers should report the following:
- Outcome data separately for treatment and control group (means for dichotomous outcomes, frequency for binary outcomes, regression co-efficient for adjusted estimates);
- Sample standard deviation pooled across treatment and control groups;
- Standard error or confidence intervals of the treatment effect (for cluster randomised controlled trials, standard errors should be adjusted for clustering, and the intra-cluster correlation (ICC) should be provided – here is a simple way to calculate the ICC in Stata);
- Sample size in treatment group (if clustered, number of clusters + average number of students per cluster) at baseline and at follow up;
- Sample size in control group (if clustered, number of clusters + average number of students per cluster) at baseline and at follow up.
That’s it. By our count, that’s just 6 variables for a non-clustered impact evaluation, and 9 for a cluster randomised controlled trial. Not so hard. Now that your study is in the review, you can help us make the review better.
Methodological details that will help with appraisal
Systematic reviewers also need methodological details to ensure studies are combined appropriately and to critically appraise the risk of bias of individual studies. The risk of bias assessment allows researchers to evaluate the certainty of systematic review findings by evaluating whether the researchers were able to avoid factors which are known to bias results, whether due to selection, attrition, reporting, or others.
Figure 1 below provides the result of such a risk of bias assessment from 3ie’s recent systematic review of the effects of education programmes. Studies are rated as having high, low or unclear risk of bias across seven categories of bias. An unclear rating is typically given if the study does not provide enough information for reviewers. As the figure highlights, for many categories the risk of bias remains unclear for almost 40 per cent of studies. So this highlights how limitations in study reporting can limit our ability to make clear statements about the certainty of the evidence.
To help this assessment researchers should report the following:
- Unit of allocation and unit of analysis (and if they differ, whether standard errors were corrected for clustering)
- The type of treatment estimate provided (e.g., is it an average treatment effect or treatment on the treated?)
- Details about treatment allocation, including how any randomisation was implemented and if it was successful (baseline balance)
- Clearly report and justify methods of analysis, especially if you use unusual approaches. (In other words, convince readers that you’re not just reporting the results of the analysis with the most interesting results.)
- Describe the conditions in the comparison group, including distance to the groups receiving the intervention and any steps to address risks of contamination.
- Report results for all primary and secondary outcomes clearly, including results that were not statistically significant or negative.
Figure 1: Risk of bias assessment in 3ie’s systematic review on education effectiveness (Snilstveit et al., 2015)
Intervention design and implementation: What is the what that works?
The phrase ‘what works’ is commonly used in debates about the use of evidence to inform decision- making. But often the description of intervention design and implementation is so vague in impact evaluations that even if the ‘what’ is found to be effective, it would be difficult to replicate the programme elsewhere. This limits the usefulness of individual studies in and of themselves, but in systematic reviews the issue is magnified.
In the worst case, this can lead to misleading conclusions. But routinely it also limits the usefulness of systematic review findings. This can be avoided if researchers report details of intervention design, implementation and context.
Finally, it is not enough to know if something is effective. We also need to know what it cost and to be able to compare costs across different programme options. Information on resource use and costs is rarely provided in impact evaluation reports, and therefore few systematic reviews are able to say anything about costs. (J-PAL has some useful resources on how to do cost analysis.)
To make study results more useful, consider the following:
- Describe the intervention design in sufficient detail for replication (what was delivered, by whom, for how long). If you can’t put it all in the body of the paper, use a technical appendix.
- Describe what actually happened: Was everything delivered as planned? If not why not?
- Provide a description of the context where the programme was delivered, including demographic characteristics of the population and relevant social, cultural, political and economic characteristics of the context.
- Report details about resource use and costs to facilitate cost-effectiveness (What to report? There are good resources here and here.)
Most existing checklists are generic and the specific details necessary to address each point above will vary between types of programmes. But the In-Service Teacher Training Survey Instrument developed by Popova et al. provides an example of a tool developed specifically to document the design and implementation details of in-service teacher training programmes.
Better reporting is a cheap way of increasing the value of research
Studies will be more useful if they are better reported. And improving study reporting is a relatively cheap way of increasing the value of research. It will require some additional effort by researchers in documenting study procedures. We may need to develop or adapt existing guidelines. But the cost of doing so is very low compared to the waste of resources on studies that cannot be used because of poor reporting.
In health, the issues of inconsistent reporting are being addressed through the development of reporting guidelines and enforcement of those guidelines by journal editors and research funders. Such guidelines have yet to be developed for social sciences more broadly, but the CONSORT guidelines for pragmatic trials is a good place to start. Researchers from the University of Oxford are also working on a CONSORT extension for social and psychological interventions.
This blog post first appeared on the World Bank’s blog site, Development Impact. The original post can be found here.
Tags: education, impact evaluations, methods, RCTs, reporting, systematic reviews, transparency and accountability