This week we proudly launch the Impact Evaluation Repository, a comprehensive index of around 2,400 impact evaluations in international development that have met our explicit inclusion criteria. In creating these criteria we set out to establish an objective, binary (yes or no) measure of whether a study is an impact evaluation, as defined by 3ie, or not. Some criteria were simple (does the study evaluate a programme or policy?) while others were more controversial (does it use experimental or quasi-experimental methods?).
But for one particular criterion, studies did not always fit neatly into a ‘Yes’ or ‘No’ category: Does the study measure programme effectiveness?
One of the key identification strategies in impact evaluation is the randomised controlled trial (RCT). This method involves the random assignment of an intervention to a study population. As you can well imagine, there are an awful lot of RCTs in the biomedical sciences. A quick search of PubMed reveals that more than 360,000 studies published since 1961 have been indexed as RCTs. By our estimation, around 12,000 of these have taken place in low- and middle-income countries. We knew right away that many were medical efficacy trials that would not be directly relevant for international development policy making.
So we had to draw a line in the sand; a line we called ‘Effectiveness’. The problem with drawing lines in the sand, of course, is that sometimes they disappear.
At the outset, the difference between efficacy and effectiveness studies seemed simple enough. Efficacy trials (usually small scale) determine whether a treatment works under ideal (laboratory) conditions. Meanwhile, the more relevant effectiveness studies (normally large scale) examine whether that treatment works under ‘real world’ conditions. But after a while, that line in the sand became pretty blurry. A lot of questions started cropping up. Should all large-scale community-based trials of Vitamin A supplementation count as development impact evaluations? Does every vaccination trial in the developing world count as a development impact evaluation? What sets impact evaluations apart?
Many folks in the biomedical sciences have already pointed out that these two categories exist on more of a continuum than as mutually exclusive concepts. As Mark Borigini notes, it is rare of have a perfect clinical study. Indeed, a number of trials we considered for the repository primarily examined efficacy, and also added value to the conversation of treatment effectiveness. But in that case, any biomedical RCT that measures outcomes at the household, community, or regional level (just about anything outside of a laboratory setting) could also be considered an effectiveness trial.
There were many times when we found ourselves saying: “this one feels pretty efficacy-ey,” or “that study has the distinct aroma of effectiveness,” or “it has a certain, je ne sais quoi.” As it turns out this kind of gut-feeling analysis isn’t far off. But we needed to ground our subjectivity in a more deliberate way.
To do this, we drew from early conversations around explanatory and pragmatic trials. Explanatory trials are used to test causal research hypotheses, while pragmatic trials are intended more to inform policy decisions. As Schwartz and Lellouch (1967) point out, this distinction often comes down to the ex ante attitude of the authors around trial design. Pragmatic trials answer real world questions about what treatment is best for the patient in the immediate moment. For 3ie, pragmatic trials produce pragmatic results, and are generally the most applicable. We are not merely concerned with the effect of a drug; we are concerned with the effectiveness of the overall intervention such that we can make recommendations that inform development programming.
To guide this sometimes-subjective decision making, we created a screening criterion (below) to help our screeners conceptualise where a study fits on the efficacy – effectiveness continuum. This tool notwithstanding, what we ultimately found is that in a small number of circumstances it is not totally clear where a study belongs.
In these cases we look to Schwartz and Lellouch’s attitude towards trial design. Though, a more apt metaphor might be found in the 1964 landmark U.S. Supreme Court case Jacobellis v. Ohio. Writing for the concurring opinion, Justice Potter Stewart famously described his threshold test for determining whether obscenity was protected under the first amendment by saying, “I know it when I see it.”
Item 6a from the 3ie Repository Screening Tool
Studies may exist anywhere on the efficacy-effectiveness continuum. Typically, efficacy studies examine treatment outcomes under highly controlled conditions. Effectiveness studies go beyond laboratory trials and examine interventions in real world settings. Note that RCTs that only address the biomedical efficacy of a drug or treatment should be excluded. The following are screening guidelines to help make this judgment:
If any of these conditions are met in addition to methodological criteria in #6 above, select ‘Yes’:
- The intervention under study promotes a social, economic or behavioral change either as one of the final measured outcomes or as a mechanism within the theory of change (beyond the self-administration of a drug). For example, the study may include health behavior messaging, training, provision of information, or screening or surveillance for specific disease conditions.
- The study measures any other outcomes in addition to or beyond purely biomedical indicators (such as returns to education, economic productivity, quality of life, disability adjusted life years (DALYs) and spillover effects).
- The study measures the cost-effectiveness or cost-benefit of the treatment(s).
- The study records any additional formative information that could guide the design or execution of future studies. For example, an RCT that also measures acceptability of a particular treatment (measuring respondent satisfaction with treatment not merely a rate of compliance or uptake) would be included.
- The treatment is both prepared and delivered by a community health worker, or trained layperson (such as a parent, teacher or community member and not merely one of the program or study enumeration team).
- The programme or outcomes measured answer, or attempt to answer, a question relevant to the roll-out of international development policies or interventions.
If it is unclear whether the study meets any of these conditions (a-f), select ‘Unclear’.
Note that in erring on the side of inclusion, studies which are ‘Unclear’ should likely be included.
If the study meets none of these conditions (a-f), select ‘No’.
Tags: clinical trials, databases, impact evaluations