Monthly Archives: March 2015

MDG for water: is the job done?

USAID_6806366983Water provision remains high on the global development agenda including political commitments such as the Millennium Development Goals (MDG) and associated post-2015 targets. By 2012, the United Nations declared that governments had met the MDG drinking water target to ‘halve the number without access to safe drinking water (defined as access to water from an improved source within 1 kilometre of the household).’  This suggests that some development efforts are working.

When your target is only 50 per cent, it is not a surprise that the scale of the remaining challenge is great. Seven-fifty million people, many living in Sub-Saharan Africa, still do not have access to safe drinking water. For those who do have nominal access, water supply is often unreliable and frequently takes more than 30 minutes to collect. And the world is long way away from providing the basic sanitation services needed to free communities from open defecation.

Water is central for efforts to improve quality of life for the world’s poorest people. It can support virtuous development cycles and pro-poor growth. In contrast, poor access to water and unhygienic sanitation conditions are likely to explain why some countries such as India have worse child malnutrition outcomes than their income levels alone would predict.

In theory, clean drinking water helps combat diseases such as diarrhoea, which kills almost 2 million children aged under-5 each year – six times more children than global conflict. Water can also enable hygienic washing practices which reduce exposure to respiratory infection and parasites like worm infections (which cause malnutrition) and trachoma (which causes blindness). And water poverty is gendered. It is typically women and girls who are subject to the drudgery of carrying water over long distances to the household. This potentially exposes them to musculoskeletal injury and physical attacks, and takes away time that could be more productively spent. What remains at issue, however, is the extent of evidence supporting such claims and the likely size of these impacts.

3ie has produced an evidence gap map that addresses this issue by consolidating what we know about what works in the water, sanitation and hygiene (WASH) sector. The gap map has been developed as a tool that can help policymakers access rigorous systematic review evidence on WASH and assist researchers and funders for determining priorities for conducting future systematic review and impact evaluation research. The results of the evidence gap map are available online and a report will be published shortly. The findings are summarised here.

We found, appraised and summarised 137 impact evaluations and 26 systematic reviews examining effects of WASH provision.  Evidence from systematic reviews suggests WASH interventions can make a big difference to combating infectious diseases and reducing child malnutrition. Hygiene promotion is probably the most efficacious way to reduce child diarrhoeal disease rates. However, recent impact evaluations, published since the systematic reviews were undertaken, have questioned the scalability of community programmes for promoting hygiene. And reviews of hygiene promotion also need to take into account the role of water supply as an enabling co-intervention more systematically.

In contrast, evidence suggests interventions to treat dirty drinking water do not lead to large sustained improvements in diarrhoea because uptake is not sustained and willingness to pay for water treatment is limited.

However, most systematic reviews do not attain the status of ‘high confidence in the review findings’. This is often due to the limited nature of searches for unpublished literature and the questionable rigor of many of the included studies. There are also concerns about methods of synthesis, for example where studies have used vote-counting rather than statistical meta-analysis, or, where meta-analysis has been used, more attention needs to be given to examining heterogeneity and grouping interventions appropriately. Many WASH systematic reviews are also simply out of date. In particular, systematic reviews of water treatment interventions need to be updated to include the most recent rigorous evidence from blinded studies which have called into question the reliability of self- and carer-reported diarrhoea data.

Although WASH sector researchers have shown long-standing commitment to theory-based impact evaluation, the evidence base remains small, especially for non-health outcomes, as shown in the figure.

The figure shows serious gaps in evidence for non-health outcomes. These outcomes are likely to be at least as, if not more, important to quality of life for programme beneficiaries as health outcomes since they are easily observed. In particular, data on productive sector and gendered outcomes are critically under-collected and under-reported. Not a single rigorous impact evaluation has attempted to measure impacts of water and sanitation improvements on women’s and girls’ safety and very few have examined the drudgery involved in water collection and transportation. More studies collecting data on intermediate outcomes like the time used to collect water or access sanitation are also needed to shed light on these factors.

The quality of the impact evaluation evidence has tended to be quite low in the past due to lack of rigorous methods, particularly for costly interventions like water supply improvements. Rigorous evidence can help us determine how we can most effectively meet these challenges and improve the lives of the most disadvantaged people around the world. Researchers have shown that it is possible to apply high quality evaluation methodologies, including randomized assignment, to evaluate piped water connection and sanitation impacts. 3ie has already supported several such studies.

In partnership with the Water Supply and Sanitation Collaborative Council (WSSCC), 3ie has recently embarked on a programme of research which aims to start filling some of the important research gaps by funding new impact evaluations and systematic reviews. We encourage researchers to look out for a call for rigorous systematic reviews of WASH-sector programmes that will be announced by next week.

Reversing the resource curse through impact evaluations

swisscan_532687354Countries such as Sudan, Democratic Republic of Congo and Nigeria have large reserves of natural resources. They are also countries that have suffered extended periods of political violence, authoritarianism, corruption, inequality and poor growth.

What causes this imbalance of high wealth on one side and extreme poverty on the other? The correlation between the quantity of natural resources reserves and poor economic growth is generally considered to be proof of a natural resource curse.

Are countries blessed with large reserves of minerals, natural gas, petroleum and forests necessarily doomed to fail? Is the resource curse a reality? Can it be reversed? If yes, how?

Are good policies the answer? Can policies that encourage information flow and political participation help in ensuring accountability and better development outcomes?

In recent years, global initiatives to improve transparency in revenue collection from extractives, such as public reporting of taxes and other payments received by the governments, have proliferated. For example, the Extractive Industries Transparency Initiative seeks to improve government revenue disclosure, while Publish What You Pay asks large corporations to publish what they pay to governments.

There are several national and sub-national level initiatives that created independent watchdogs, ring-fence resources to spend on specific sectors and prescribe specific ways in which revenue should be shared among local and federal governments (see here for a couple of examples that are currently being evaluated).

However, the evidence on what works and why in promoting transparency, accountability and achieving better development outcomes is just very sparse (Acosta, 2014; Leavy, 2014)

The evidence gap

There are several reasons for this evidence gap. Establishing a rigorous control group and a counterfactual for attributing the observed changes to transparency and accountability initiatives (TAIs) is often hard. This is mainly because these initiatives are mostly nation-wide standards or soft guidelines that apply to all the players in the sector; they are a bundle of complex activities that engage multiple political actors and watchdogs; and they are targeted and implemented in a non-random way.

Further, the theory of change that outlines how better information disclosure (transparency)  leads to restrained government discretionary spending (accountability), which in turn helps in achieving better development outcomes, is unclear. Estimating impact therefore becomes difficult.

Although there are difficulties in evaluating impact, it’s also clear that the possibilities still exist. At 3ie, we aim to fill this critical evidence gap with rigorous impact evaluations. We recently invited impact evaluation proposals under the DFID-supported Transparency and Accountability Thematic Window. These impact evaluations will use innovative methods to evaluate TAIs in the extractives sector of a selected set of low- and middle-income countries.

What did we do?

Many research teams have expressed interest in undertaking impact evaluations. Additionally agencies working in extracting natural resources in Ecuador, India, Kenya, Mongolia, Mozambique, Myanmar, Peru and Uganda have also come forward for having their initiatives evaluated. At a recently held workshop for 3ie’s Transparency and Accountability Thematic Window, we asked all of them a question: How can we learn more about what works and why in TAIs?

So what did we learn from our dialogue?

This dialogue between research teams and implementing agencies continues. But important insights have emerged from our first impressions.

Broaden the focus: Transparency initiatives shouldn’t only be confined to disclosing information and actions related to revenue collection such as taxes and other payments. We should also be thinking about evaluating policies that relate to contracting, licencing or procurement, distribution of income among central and local governments, and independent oversight of public expenditure and sectoral allocation of revenues that are derived from natural resources.

Engage with people: Focus on engagement mechanisms such as providing customised information, promoting diverse platforms for debate, and assessing whether these are necessary – and sufficient – to improve transparency. See here and here for examples of citizen engagement.

Emphasise cross learning: While there are few impact evaluations of TAIs in the extractives sector, there are several impact evaluations of interventions to improve transparency and accountability in service delivery and budgetary processes. We should use what we have learned from these fields to understand how information for example can be presented in different and more effective ways to inform debate and better accountability.

Be aware of the short time span: We need to be realistic about what can be achieved. It may be more practical to examine intermediate outcomes such as delivering services rather than long term and difficult-to-measure outcomes such as livelihood improvement and poverty reduction.

It is clear that we need to use innovative approaches for impact evaluations of TAIs. This will require researchers and practitioners to come together to and jointly answer questions of how rigorous evidence can be generated.

Not all ‘systematic’ reviews are created equal

Save the children_7395206006In a recent World Bank blog based on a paper, David Evans and Anna Popova argue that systematic reviews may not be reliable as an approach to synthesis of empirical literature. They reach this conclusion after analysing six reviews assessing the effects of a range of education interventions on learning outcomes.  The main finding of their analysis: While all these reviews focus on the effects of learning outcomes based on evidence from impact evaluations, there is a large degree of divergence in the studies included in each review, and consequently the conclusions they reach.

We agree with many of the points Evans and Popova make, but not with their conclusion that systematic reviews should be taken with a grain of salt. Instead, we believe their exercise actually strengthens the case for doing more high-quality systematic reviews.

Systematic reviews aim to address a number of well-known problems with academic research and less systematic approaches to evidence synthesis. This includes publication bias, unconscious bias from the reviewer and variable quality of research output. They also overcome the limitations of single studies that may be sample, time and context-specific or illuminate only one aspect of a policy issue.

Systematic review methodology and guidelines are based on empirical research and have been developed to address these issues. While there are different definitions and methodological guidelines, key features of systematic reviews are the use of systematic and transparent methods to identify and appraise relevant studies, and to extract and analyse data from the studies that are included in the review. Not all systematic reviews include a meta-analysis.   But when it is feasible, meta-analysis is considered the most rigorous method of synthesis.

By gathering data on the totality of evidence on a question, they are an invaluable research tool for establishing the overall balance of evidence on a particular question, separating higher quality from lower quality evidence and the generalisable from the context-specific. High-quality systematic reviews that identify the best available rigorous evidence on a particular issue can be a goldmine for policymakers.

So, the question of how definitive are these reviews? could be usefully prefaced with how systematic are these reviews? Most of the reviews included in Evans and Popova’s analysis are not actually systematic reviews, as per the definitions of systematic reviews commonly used, including by major review producing organisations like the Cochrane Collaboration, the Campbell Collaboration, the Collaboration for Environmental Evidence  and 3ie.

Rather, the reviews analysed by Evans and Popova are a mix of literature reviews, meta-analyses and systematic reviews. Most of them do not undertake systematic and comprehensive selection, appraisal and synthesis of the evidence. For instance, while some of these reviews document their search, Conn (2014) appears to be the only review that includes a comprehensive search of the published and unpublished literature for studies of the effect of education interventions on learning outcomes in Sub-Saharan Africa.

This is not a critique of these reviews – they offer valid and useful contributions to the evidence base. But most of these studies are not systematic reviews and should not be judged as such.  Nor should systematic reviews in general be judged by the limitations of these reviews.

Having said this, even if these reviews had taken a systematic approach, we would still expect them to differ in terms of the studies they include. They have different purposes, different geographic foci and focus on different levels of education and outcomes. Authors make decisions about their inclusion criteria based on review scope and purpose. The included studies will reflect this, and the important point is to be explicit about the choices made so that readers can interpret the findings of a review accordingly.

Mapping and assessing the quality of existing systematic reviews in education

So what is the quality of systematic reviews in education? As part of a large scale systematic review of the effects of primary and secondary education interventions in low- and middle-income countries (L&MICs), we mapped existing systematic reviews in the sector to answer this question. Our recently completed education evidence gap map identified 20 completed or ongoing systematic reviews assessing different education interventions in L&MICs.

The 3ie education evidence gap map shows that there are a number of well-conducted systematic reviews in the sector. But it also highlights that many of the existing systematic reviews have weaknesses that reduce our confidence in their findings. Based on a careful appraisal of the methods applied in the reviews (using a standardised checklist), eight of these reviews have been given an overall rating of ‘low confidence in conclusions about effects’.

The main weakness identified in these reviews is a lack of critical appraisal of included studies. They do not appraise the risk of bias of included studies. Hence the reliability of the findings is not clear to readers. Secondly, several of these reviews have limitations in their searches. Finally, some of the reviews use vote counting when meta-analysis is feasible and have issues with the interpretation of findings.  What we take away from all this again is that we need more high-quality systematic reviews.

Improving the practice of evidence synthesis

Evans and Popova raise some important points about ways that systematic review practice can be improved. Specifically, they call for more comprehensive searching, the combination of meta-analysis with narrative review and clearer thinking about intervention categorisation. We agree with these suggestions and, as it so happens, are working on addressing them in our ongoing review of the effects of a range of education programmes on enrolment, attendance and learning outcomes in L&MICs.

A related issue highlighted by Evans and Popova is the combination of highly divergent studies in a meta-analysis. In a high-quality systematic review, this would be addressed by conducting thorough heterogeneity and sensitivity analyses. Heterogeneity and sensitivity analysis can help explain some of the variation in outcomes across studies and point to potential explanatory factors that might explain this observed divergence. It can also be used to see if different ways of pooling or splitting interventions in a category can bring out more granular findings about intervention effects.

In our own education systematic review, we are also combining meta-analysis with narrative synthesis and causal chain analysis. To inform this analysis, we are drawing on evidence from a wide range of programme documents, process evaluations and qualitative studies associated with the interventions in the included impact evaluations. We think this is crucial for unpacking sources of heterogeneity and identifying factors associated with improved outcomes.  We are trying to find answers not just to the what works question but also the important when and why questions. We expect to publish this full systematic review in late 2015.

It is clear that not all ‘systematic’ reviews are created equal But if done well, systematic reviews provide the most reliable and comprehensive statement about what works. More effort needs to go into improving the quality of systematic reviews so that they become useful and reliable sources of evidence for more effective policies and programmes.

Understanding what’s what: the importance of sector knowledge in causal chain analysis

DFAT_10710674044My recent blog, How big is big enough?, argued that you need sector expertise to judge if the effect of a programme is meaningful rather than just statistically significant. But the need for sector expertise goes far deeper than that.

I have recently been reading impact evaluations of water supply and sanitation studies. The studies by the non-sector researchers (mostly economists) collect data on the outcome of interest, usually child diarrhoea. But they do little more than that.

The exclusive focus on outcomes contrasts sharply with, for example, a 3ie-funded impact evaluation by Tom Clasen and others of the Total Sanitation Campaign (TSC) in the state of Odisha in India. The TSC combines social mobilisation with government subsidy for toilet construction. The study collected data on several indicators of latrine use such as the smell of faeces, stain from faeces or urine, the presence of soap, the presence of a broom or brush for cleaning, and the presence of slippers. The researchers also tested for faecal indicator bacteria in water sources and in household drinking water, as well as on children’s and mothers’ hands and on children’s toys. They tested for hand contamination of household members using hand rinse samples. And they set fly traps to measure the density of flies.  This is an excellent example of the approach advocated by 3ie of theory-based impact evaluation, of measuring indicators along the causal chain. And so when there was no improvement in child diarrhoea despite a substantial increase in the latrine coverage the researchers could identify the likely reason: not all family members were using the latrine.

Another example of the use of the causal-chain approach comes from an impact evaluation of a handwashing and hygiene education intervention in Pakistan.  In addition to child diarrhoea, data were collected on water treatment and handling, typical hand cleansing materials, direct observation of hand washing, and reported occasions for hand washing.  The careful analysis of variables along the causal chain is again notable: who was handwashing and were they doing it regularly and properly. By and large they were, explaining the observed reduction in diarrhoea.

3ie is a strong advocate for embedding impact evaluations in a broader analysis of the causal chain. Many 3ie staff have blogged about it (See here, here and here).  But it is not something a lot of impact evaluation teams apparently get. They generally fail to see the need to include sector experts in the study design teams, or even to consult them at that stage. They also often engage insufficiently with the relevant sector literature.

So, impact evaluations such as those mentioned above in India and Pakistan deserve to be more widely read.  In addition to the causal chain analysis, both studies have striking findings. In India, a very substantial increase in latrine coverage and use didn’t achieve a reduction in diarrhoea. And in Pakistan, the intervention continued to have sustained improvements in hygiene practices three years after completion.

I talk a lot about the difficulty of achieving behaviour change, and how it can take decades rather than months. So, the Pakistan study intrigues me. Why it worked is the subject of another blog.