Chapter 4: Data and Outcome Measurement1
How does a government know if its economy is growing? If its citizens are healthy? If its
education system is working? If outcomes for marginalized groups are improving? Answering
these questions â€” all of which are fundamental to government functioning â€” requires data.
Credible and actionable data on key outcomes is an essential requirement for a government
(or any organization) to be successful. The centrality of good data to running a modern state is
best seen by noting that the origin of the word statistics comes from the word for state!
Despite the importance of good data for effective governance, Indiaâ€™s data systems are
outdated and function far below their full potential. It is not that India does not collect data.
Indeed, the Indian statistical system put in place in the 1950s was considered one of the best
in the world in a low-income country at that time. However, Indiaâ€™s measurement infrastructure
has not kept pace with the possibilities enabled by the dramatic advances in technology for
collecting rapid, real-time, high-quality data. Just like in other aspects of state capacity, we
have also underinvested in our statistical infrastructure.
Over time, this lack of investment has led to the data we have often being inaccurate,
unrepresentative, or too delayed to be useful. Even where the data is credible, it is used more
to describe how we are doing, and not used to improve the functioning of the government.
These lacunae in our measurement infrastructure have different causes but the same effect:
weakened state capacity, hurting the ability of the government to deliver public services, and to
obtain timely feedback on its policies and programs.
Establishing systems to regularly generate accurate and disaggregated data on key
parameters will allow governments to manage their personnel better, spend money more
effectively, obtain rapid and systematic citizen feedback, track if public funds and benefits are
reaching the most marginalized populations, and design and fine-tune policies to work better
with rapid course corrections based on data on both processes and outcomes. This makes
improving our measurement systems a foundational investment in state capacity.
This chapter is organized into two sections. The first summarizes the key challenges under the
status quo, and the reasons for why things are the way they are. The second outlines a set of
implementable and cost-effective reform ideas that can be taken up by any state government.
2 According to the Oxford English Dictionary, the German statistisch comes from the Latin statisticus, meaning â€œof
or relating to the state, statists, or statecraft.â€ OED.com, accessed October 7, 2020.
1 Draft chapter from the book â€œMaking Government Work Betterâ€ by Karthik Muralidharan ([email protected]).
This draft is made available for comments only.
PLEASE DO NOT CITE OR CIRCULATE WITHOUT PERMISSION OF THE AUTHOR.
1 Understanding the status quo and its limitations
1.1 A snapshot of the Indian statistical system
India has a well-designed and comprehensive structure for collecting, analysing, and
disseminating key statistics on the Indian economy, which is overseen by the National
Statistical Organization (NSO). The NSO comprises two major divisions including the Central
Statistics Office (CSO), and the National Sample Survey Office (NSSO). The CSO is
responsible for several key statistics on the Indian economy including national income/GDP,
consumer prices and inflation, and the economic census. The NSSO is responsible for carrying
out the National Sample Survey (NSS), which is Indiaâ€™s flagship household survey to measure
several key development indicators including consumption, poverty, and employment.
These systems were considered to be of a world-class standard when they were set up in the
1950s under the leadership of P C Mahalanobis. In particular, the NSS provided the template
for the World Bankâ€™s Living Standards Measurement Survey (LSMS), and thereby influenced
the standardized measurement of consumption and poverty across the world. However, over
time, a combination of technical and political challenges have contributed to a weakening of
the NSS to the point that even the data has not been released for the 2016-17 round. Thus,
the last large-sample NSS data we have is from 2011-12, which has dealt a severe blow to our
ability to understand trends in consumption and poverty in the past decade.3
However, even when NSS data is released, it has limitations. In particular, while the NSS
provides a useful barometer of national progress on key development outcomes, it was not
designed — and hence not very useful — for improving day-to-day governance. There are
several reasons for this. First, the large-sample NSS rounds are conducted only every five
years, which limits its use for decision making since the feedback cycle from government
actions to data on changes is too slow. Second, results are usually not spatially disaggregated
enough to support localized government actions. Finally, these household surveys also
typically lack credible measurements of service delivery and outcomes: they ask if children go
to school but do not check if teachers are in classrooms or if children are learning.
An effective measurement and data infrastructure for governance should allow governments to
track — and act on — growth and development outcomes (like employment, education and
health); understand beneficiary experience of important programs (e.g., have citizens received
and been able to access their cash transfers), and ensure that administrative data recorded by
government staff are reliable. India struggles on all three fronts.
3 See the essay on â€œHow Indiaâ€™s statistical system was crippledâ€ by Bhattacharya (2019) for a detailed narrative of
the technical and political reasons behind the progressive weakening of Indiaâ€™s statistical system.
1.2 There is little actionable data on fundamental development outcomes
In 2000, India set out to enroll all 6- to 14-year-old children in school. Today, that goal is nearly
met: over 97% of these children are enrolled in schools as of 2019.
This is an impressive achievement. But while enrollment is a necessary first step for learning, it
is far from sufficient. After all, the number of â€œenrolledâ€ children includes millions who are
enrolled on paper but do not attend school regularly, and many more millions who attend
school but donâ€™t learn very much. In 2018, only half of rural government school students
enrolled in Standard 5 could read even a Standard 2 level text. Thus, despite substantial 4
increases in expenditure and school inputs (like infrastructure, teachers, and school feeding
programs), the conversion of these inputs into the ultimate goal of learning has been weak.
One reason is that the government primarily focuses its efforts on measuring inputs and not
outcomes. This focus on visible inputs over less visible, and more difficult to measure â€” but
far more important â€” outcomes is seen across sectors, and deeply hinders the effectiveness
of government policies and programs. The problem is not lack of data per se. Every
department collects data but, like in education, they focus much more on inputs instead of
outcomes. Indeed, for the past fifteen years, the most widely used data on learning outcomes
in India has come not from the government, but from the nongovernmental organization
Pratham, through the Annual State of Education Report (ASER).
1.3 The government lacks visibility on citizen experiences with public services
Not only do governments in India lack data on outcomes, they often have little visibility on how
public programs are performing on the ground because they are unaware of beneficiary
experiences. Are the poor getting their food-security benefits under the public distribution
system (PDS)? Are they able to get work under the National Rural Employment Guarantee
Scheme (NREGS)? Are they getting paid on time? Can they access the funds in their bank
accounts? This is an age-old problem. Historically, emperors from Ashoka to Akbar were
known to roam around their kingdoms in disguise to observe the true state of affairs. Yet,
despite dramatic advances in technological possibilities, systematic measurement of the
quality of last-mile service delivery continues to be almost non-existent.
For instance, in 2015, the Government of India began using direct benefit transfers (DBT) to
beneficiary bank accounts instead of providing subsidized food through the public distribution
system (PDS) in three union territories (Chandigarh, Puducherry, and Dadra Nagar Haveli). As
4 Data on enrolment and learning outcomes are both from ASER (2018).
part of a project with NITI Aayog, my colleagues and I were requested to monitor these pilots.
While government officials proudly pointed to bank records showing that 99% of funds had
been transferred successfully, our field-based surveys of nearly 5,000 beneficiaries revealed
that over 30% reported either not receiving funds or not knowing if they had!5
In many cases, beneficiaries did not know about fund transfers because they had not gone to
the bank to update their passbooks, and the government had not implemented any mechanism
to notify them when funds were transferred. In some cases, funds were transferred to inactive
bank accounts. Because it had no way to collect data on beneficiary experiences, the
government was unaware of these problems. While government departments do have
grievance redressal mechanisms, such as phone helplines, the data obtained here does not
provide systematic information about program performance because it only reflects the inputs
of proactive citizens who actually know how to call and complain. Across sectors, the lack of
systematic data on beneficiary experiences hinders effective programme design and delivery.
1.4 The governmentâ€™s own administrative data is often unreliable
Given the lack of data on outcomes and beneficiary experience, governments in India mostly
rely on official administrative data on outcomes, to check for example, if farm productivity is
increasing or if childrenâ€™s nutrition is improving in a district. Indeed, the majority of data within
the government comes from such internal sources.
The problem is that there is growing evidence that there are substantial inaccuracies in such
data. For instance, a recent study in Madhya Pradesh found using random audits that official
data on student learning levels in government schools were severely inflated, especially for the
academically weakest students. The same pattern was found in Andhra Pradesh in both public
and private schools, suggesting that the problem of administrative data quality is widespread.6
These types of discrepancies in administrative data are rife across sectors and states. Even in
a high-capacity state like Tamil Nadu, my colleagues and I found that child malnutrition rates
were severely underreported: administrative data showed a rate of severe malnutrition below
1%, compared to about 8% in both our data and in the National Family Health Survey data.
Similarly, the rate of reported moderate malnutrition in administrative data was under 8%
compared to 25-35% in different independent data sources. Thus, the true rates of 7
moderate/severe malnutrition were 4 to 8 times higher than reported in official data.
7 These results were obtained as part of the fieldwork conducted for Ganimian et al (2021).
6 Singh (2020)
5 See Muralidharan, Niehaus, & Sukhantar (2017).
Frontline department staff who have to collect and report data are prone to understating
problems and overstating success. There are several reasons for this including: (a) avoiding
looking bad to (or getting pulled by) their supervisors, (b) avoiding the risk of being asked to do
extra work to address the issue, and (c) pressure from senior officials to not present data that 8
would paint the department in an unfavorable light.
Taken together, the lack of data on outcomes and beneficiary experience combined with poor
administrative data quality can cripple the stateâ€™s ability to function effectively. This problem is
well-known within the government. Several senior officials Iâ€™ve interacted with over the years
have bemoaned these data issues. In the words of one senior government advisor, â€œWe are
usually flying blind and operate on intuition and anecdote, but rarely have good actionable data
to work with.â€ Also, memorable are the words of a Chief Secretary who reacted to the problem
of administrative data quality by noting to me that, â€œour systems are built on a house of cards.â€
1.5 Why is there no systematic, actionable measurement system in India?
If officials understand the importance of actionable, credible data in improving government
performance, then why are we not collecting it? There are at least four reasons for this.
First, data collection is perceived to be expensive even though in practice it is not. The
problem is that it usually does not have a dedicated budget. Data collection is therefore often
considered discretionary spending which has to compete with other political spending
priorities. In practice, most governments prefer to spend on programs, which are believed to
lead to immediately visible and politically more rewarding outputs, rather than spending on
long-term investments in better measurement systems.
Second, data collection typically does not yield timely returns even for those who understand
its importance. A Principal Secretary or department head looking to leave a legacy of concrete
achievements at the end of their two-year (and often shorter) tenure will often see investing in
measurement as a low-return activity because the data will often come in only after they leave
their role. Further, if their successor is not interested in continuing the measurement (or has
different views on what to measure), their efforts will be wasted.
Third, with few exceptions, government departments lack the technical capacity to conduct or
even procure high-quality measurement on an ongoing basis. There are non-trivial issues
regarding sampling and representativeness, designing good survey questions, managing field
operations, analyzing and presenting the data, and interpreting the data and analysis with
8 One teacher in Madhya Pradesh actually told a colleague of mine that if he marked a child as achieving a D or F
grade that he would then have to conduct extra remedial classes, which he did not have time for!
suitable caveats. As a result, measurement exercises often rely on external donor-funded 9
initiatives to provide technical support. This results in measurement being undertaken on a
sporadic basis depending on donor funds and interest as opposed to on an institutionalized
and consistent basis within the government to inform policy making.
Finally, and perhaps most importantly, for politicians and officials, ignorance can often be bliss.
In a culture of soundbyte, and headline driven media, there is little nuance in public discourse
about the complexity of Indiaâ€™s problems. Thus, data on the reality of Indiaâ€™s poor development
outcomes often puts officials and political leaders on the defensive. Even though good data
and measurement are the foundation for better governance, the short-term consequences for
officials can be negative. As a Secretary to a Chief Minister in a northern state once told me
when I recommended high-frequency measurement of the quality of service delivery: â€œit is a
very good idea; but if the data makes us look bad, we will waste a lot of time answering
questions and dealing with negative press coverage.â€
The attitude that â€œignorance is blissâ€ is powerfully illustrated by an anecdote from Rukmini
Banerji, the CEO of Pratham, one of Indiaâ€™s largest education NGOs. In one of her field visits,
she asked a sarpanch (village leader) why his village had such poor learning levels among
children. The sarpanch and an education department official who was with them at the time
refused to believe her: they insisted that the students in the village were doing fine and pointed
to their administrative records which said just that. Rukmini personally took them to ask basic
questions to children in the village to demonstrate that contrary to the administrative records,
many students struggled with basic literacy and numeracy. When confronted with this
inconvenient fact, the education department official got upset and asked Rukmini, â€œMadam,
aapko asliyat se itna lagaav kyun hai? (why are you so attached to reality?)â€!
2 Building a measurement architecture for better governance in India
Taken together, the reasons above highlight how a combination of limited incentives and
capacity explain the weaknesses in our measurement systems. Fixing this situation will require
politicians and officials to understand why they should invest in better measurement systems,
what exactly they should measure, who should do measurement, and a practical roadmap on
how to do so. I now discuss each of these points.
The good news is that several top government officials understand the importance of highfrequency, and high-quality data for governance. However, to be effective and deliver to full
9 For example, a common mistake policy makers make is to over-react to changes in rankings, which can often be
driven by sampling and measurement error as opposed to actual underlying changes. More generally, using data
effectively for decision-making requires considerable statistical sophistication, which officials often do not have.
potential, measurement should not be a sporadic effort driven by the initiative of individual
officers. Rather, improved measurement should be seen as the first step in a broader set of
systematic investments in state capacity. The key to effectively using data for governance is to
institutionalize the collection, and analysis of important data so that decision makers and
managers within the government can rely on having the data available to guide their actions,
and it becomes standard operating procedure to do so. I now turn to a practical discussion of
how a state government may institutionalize and implement such a measurement architecture.
2.1 Who should lead measurement?
It may be tempting, especially from a cost perspective, to rely on departments themselves to
collect the relevant data for their work. After all, the education or health department is
best-placed to collect data on education or health outcomes. But several studies suggest that
this approach is unlikely to work well.
Consider the experience of the Government of Madhya Pradesh in education. To take on the
challenge of low learning among students, the stateâ€™s education department began testing
every government school student each year and measuring and recording child-level learning
levels. This impressive annual exercise, called Pratibha Parv, involved the entire education
department and was hailed as a â€œbest practiceâ€ in education reform by the NITI Aayog.10
But when a research team carried out an independent retest a few weeks after Pratibha
Parvâ€”using a sample of the same questions and retesting a sample of the same
studentsâ€”they found that official data on learning levels were highly inflated. In the official
data, all students were reported as scoring over 60% in both maths and Hindi. In the
independent retest only 40% of the students scored even above 50% in Hindi and none of
them scored over 50% in maths. In other words, there was no learning crisis at all as per the 11
official data – in sharp contrast with multiple independent studies! Similar patterns have been
seen in other states and sectors, suggesting that this is a widespread problem across India.
These examples may suggest that governments should outsource the data collection process,
but this can also be problematic because officials tend to be wary of numbers put out by
non-government actors. For instance, for the last 15 years, the NGO Pratham has carried out
an impressive annual survey of households as part of the Annual State of Education Report
11 See Figure 2 in Singh (2021). Interestingly, the ranking of students was unchanged between the administrative
data and the independent tests, suggesting that teachers are conscious of the importance of the rank in student
outcomes. But they inflated the levels of reported learning across the board.
10 Pratibha Parv was modeled after the Gujarat governmentâ€™s Gunotsav program that had been championed by
Prime Minister Narendra Modi when he was Chief Minister of Gujarat.
(ASER) to yield precise estimates on learning outcomes across India at the district level. Data
from this survey (which covered over 300,000 households in 2019) has become the reference
point for everyone in the education sectorâ€”except the government!
The official reaction of the Ministry of Human Resource Development (recently renamed as the
Ministry of Education) for many years has been to ignore the learning crisis that ASER data
highlight and question the quality of ASER data. Instead, they rely on their official data source,
the National Assessment Survey (NAS), which has several limitations, which are so severe as
to make the data almost useless for meaningful state-level comparisons. An appropriate 12
policy response would have recognized these limitations, engaged with the ASER data, and
used it to focus policy attention on foundational learning. Instead, the official response can
almost be characterized as denial, with the ASER data brushed aside as an inconvenience.
The point of this example is not to dwell on the specific technical aspects of the NAS versus
ASER debate. Rather, it is to highlight that even thoughtfully collected independent data may 13
not lead to follow up action if the government does not take ownership of the data and findings.
If data collection has to be independent of line departments to be credible but still be owned by
the government, who should do it?
The ideal solution would be the planning department. In every state government, planning is a
nodal department with the mandate to collect, analyze, and report data. Unfortunately, in
practice, state planning departments often simply collate and report the data provided by
individual departments themselves. But what they should be doing is to take up measurement
efforts to generate independent high-quality data, and become the primary source of credible
data, which can support governance across multiple departments.
I therefore recommend that state planning departments should take on the responsibility for
implementing a comprehensive multi-sector outcome measurement system, and that finance
departments should fund this as a strategic priority. Even under conservative assumptions, it
would cost less than 0.1% of a state governmentâ€™s budget (and likely much less). Crucially,
managing this effort through the planning department separates measurement from the line
13 The ASER data also have challenges and limitations, though not as severe as those with the NAS. See
Johnson and Parrado (2021) as well as Rukmini (2014) for illustrative discussions.
12 NAS only tests children present in government schools, and excludes those who are out of school, in private
schools, or absent on the day of testing. It only asks grade-appropriate questions, which means it cannot capture
how far below grade level students might be. And it may suffer from the same data integrity problems that have
been documented by independent researchers, because it is perceived as a high-stakes accountability exercise
for the public school system. These limitations are so severe that a recent study (Johnson and Parrado 2021)
concluded that: â€œNAS state averages are likely artificially high and contain little information about statesâ€™ relative
performanceâ€; it also noted that: â€œthe presence of severe bias in the NAS data suggests that this data should be
used carefully or not at all for comparisons between states.â€ (emphasis added)
departments. This structure will both ensure independence and data quality on one hand, but
also ensure that the data is considered â€œofficialâ€ and therefore taken seriously and acted upon.
While planning departments should lead on improving measurement, they need not
necessarily conduct the data collection themselves. Given capacity and staff constraints in
most planning departments, they can empanel a set of high-quality external agencies to
actually collect the data but department staff can oversee and quality control the data
collection process. For instance, department officials can randomly recheck some of the data
collected by third-party agencies to ensure that the data is being collected properly. This way,
external agencies can provide speed, technical support, and implementation, whereas the
planning department can conduct the validation and quality control.
I now discuss three specific ideas for improving measurement and data collection, and also
describe how exactly these data can improve governance and service delivery.
2.2. Conduct regular field-based surveys to measure key development outcomes
The first and most important gap we need to fill in our measurement systems is to be able to
generate annual district-level outcomes on key development indicators. This can be done
through regular household surveys (every 6 to 12 months) at the district-level. Since it is
impossible to measure all development outcomes, states should focus on a few essential
onesâ€”such as the main outcomes in the United Nationsâ€™ Sustainable Development Goals
(SDGs) framework which covers both development issues and overall economic conditions.
Further, conducting a single multi-sector survey is much more cost effective than each
department trying to carry out its own (sporadic) surveys for department-specific outcomes.
Such an exercise will provide states with regular data on district-level outcomes on child health
and nutrition; education and learning outcomes; crime and perceptions of safety; household
receipts of public welfare schemes; their access to and use of public and private services; and
key economic outcomes like employment, livelihoods, volatility in income, financial inclusion,
and credit access. Importantly, it will cover outcomes for all citizens regardless of whether they
use public or private services (or neither) and thereby provide an accurate picture of
population-level well being as well as how these indicators are changing annually.
In practice, a survey that covers around 2,000 households per district will provide enough
precision on district-level estimates to be usable for management purposes. Just a decade 15
15 A key technical issue in ensuring that such surveys are representative of the population is to have a sampling
frame of the entire universe of households. One practical and cost effective option is to use the voter registration
14 This is common in all high-quality survey operations and is referred to as a â€œbackcheckâ€.
ago, the prospect of conducting 2,000 surveys in every district (which would work out to
around 60,000 surveys in a 30-district state) every six months would have been considered
wildly unrealistic and prohibitively expensive. But thanks to advances in technology, largescale annual or biannual district surveys are not only feasible but also highly affordable.
Before these technological advances, data collection was an arduous, cumbersome, and
error-prone process. Surveyors would input responses onto paper forms; responses would
then be physically inputted into statistical software; and there would be large delays in cleaning
the data to correct various errors in field data entry. The emergence of tablet- and smartphonebased data collection has transformed this process. Data collection is now much quicker, more
accurate, and more sophisticated. For instance, with pre-coded questions on tablets that
automatically skip non-applicable questions (instead of a surveyor frantically skipping pages to
find the right question), there is less chance of surveyor error. Similarly, geo-tagged data 16
collection and analysis of keystroke patterns makes it easy to verify that enumerators actually
visited the household and administered the survey as opposed to just making up the data!
Overall, technology has dramatically reduced the cost, and increased the speed, and reliability
of high-quality data collection.
The vision above is not a pipe dream. A similar type of survey has already been carried out in
several districts by IDinsight, an advisory and research organization, as part of NITI Aayogâ€™s
Aspirational Districts Program (ADP). While the ADP has been shown to have positive impacts
in program districts, the main limitation of the ADP in its current form is that it only covers a few
districts in each state. As a result, the rich data generated under the ADP has not yet been
institutionalized for planning and decision making within state governments. Thus, the
opportunity is ripe for state governments to take this template and roll it out to all districts and
make such data an integral part of state-level planning, resource allocation, and management.
Conducting such a measurement exercise (with 2,000 households per district surveyed twice a
year for two hours in each round) will cost at most Rs. 1 crore per district per year. Even for a
large lower-income state like Uttar Pradesh, which has 75 districts, the annual cost would just
be INR 75-100 crores (including one-time set up costs), or around 0.02% of the stateâ€™s total
expenditure in the 2021-22 budget, which was over INR 500,000 crores. Once the viability of
16 For instance, if a respondent says â€œYesâ€ to having received a service, there will usually be follow up questions. If
they say â€œNoâ€, then the surveyor will be required to jump to the next section. In a paper-based survey, this would
typically entail turning pages manually to reach the right sections, substantially increasing error rates and the
likelihood of forgetting to cover some sections. Similarly, software can be pre-coded to flag entries that look
incorrect to prompt realtime checking by enumerators and reduce common errors in field data entry such as
adding or forgetting a digit while recording data (for example).
lists (which are public information) as the sampling frame for such an exercise. The Election Commission updates
this list faster than even the census, and the voter rolls are therefore likely to be the most current sampling frame.
this approach has been demonstrated, it may even be possible to obtain estimates that are
precise enough for block-level decision-making by increasing the sample size by a factor of 5.
Such an expansion would still cost only around 0.1% of the state budget.17
Having annual district-level data on key development outcomes can be a game changer for
governance in India on several dimensions. It will enable better goal setting, progress
monitoring, and performance management of staff and departments (see Chs. 5, 10, and all
sector chapters). It will enable improved allocation and use of public funds to better reflect the
principles of equality, equity, and effectiveness (see Ch. 6). It will allow better monitoring of
equity goals by allowing policy makers to pay special attention to outcomes of women and
marginalized groups and the extent to which these outcomes are improving over time (see Ch.
10). By measuring outcomes for all citizens regardless of whether they use public or private
services, such data will help to focus policy attention on the importance of not just improving
public service delivery, but also improving the functioning of private markets to deliver better
price-adjusted quality to all citizens (see Ch. 9).
Most importantly, it will enable a change in the culture of government from focusing on inputs
and schemes to outcomes. Consider the case of education, where three key policy goals are
access, equity, and quality. In practice, education departments routinely convert each of these
goals into input targets. The access goal gets measured mainly in terms of school
construction. The equity goal gets measured by schemes and programs for girls and
disadvantaged groups (though outcome gaps may continue to grow). Finally, the quality goal
gets converted into programs for upgrading infrastructure and teacher training.18
These are all important inputs in an education system, but at the same time, years of
high-quality evidence shows that most of these inputs are poorly correlated with improved
learning outcomes – or are not very cost effective relative to other interventions (see Ch. 11).
Thus, despite good intentions, large budgets, and considerable effort, these inputs have not
translated into better outcomes. More generally, bureaucracies often convert outcome goals
18 This is exactly what happened under the â€œResults Framework Documentâ€ exercise conducted by the
Performance Management Division (PMD) set up in the Cabinet Secretariat in 2009. Departments were asked to
identify their key performance indicators, and the school education department mostly came up with input-based
indicators with a minuscule weight on learning outcomes.
17 The cost estimates are based on inputs from IDInsight. While generating precise district-level estimates
requires a sample size of 2,000 households per district, it is also possible to use these surveys to generate
several other insights at the state level (which can be done with much smaller samples in each district). For
instance, it would be possible to add 10 different modules in different 10% samples, to have estimates that are
representative at the state-level and provide a rich real-time understanding of the evolution of quality of life of the
average citizen. These 10% samples can provide a much richer picture of state-level development and quality of
life outcomes including a detailed understanding of parameters such as employment and job search behaviour,
healthcare usage, agricultural production, time use, credit and savings, etc.
into programs and schemes, and make the latter an end in themselves, and thereby lose sight
of whether the ultimate objectives are being achieved.
With regular outcome data based on a random and representative sample of households, this
changes. It will enable senior officials and political leaders to focus their review meetings with
district-level staff on monitoring improvements in a few key (independently-measured)
outcomes as opposed to just â€œscheme implementationâ€. In turn, this will encourage
district-level officials to push the entire administration below them to focus on the best and
most effective way to improve outcomes.
Data on outcomes will allow us to achieve the cultural shift in the bureaucracy of increasing
â€œautonomy on processesâ€ and â€œaccountability for outcomesâ€ described in Chapter 3. Since
districts can now be assessed on their effectiveness in improving outcomes, the energy and
creativity of local staff can be unleashed by providing them with more untied funds to spend
flexibly in their area of responsibility. They would still need to satisfy reporting and accounting
requirements on what they spend on, but could be provided with more freedom on what to
spend on and best respond to their local needs and conditions, with annual discretionary funds
tied in part to their performance in improving district-level outcomes. Such a structure can be
highly effective at improving ground-level incentives to improve outcomes (see Ch. 6).
2.3 Use phone-based surveys to capture real-time beneficiary experience
While field surveys conducted every 6 to 12 months will generate outcome data to serve as a
barometer of progress, they may still not be timely enough to inform real-time performance
measurement and management of government service delivery. For this, there is a simple
alternative. The government could call beneficiaries directly – an exercise made possible by the
proliferation of mobile phones across India. Regular phone calls that capture beneficiary
experiences can provide real-time feedback on program design and implementation that can
almost immediately improve programs to better reflect beneficiary preferences.
For instance, in 2018, we worked with the Government of Telangana (GoTS) to measure and
improve implementation quality of the stateâ€™s flagship Rythu Bandhu program (which provided
income transfers to farmers via check). To do this, our team helped GoTS contract a
call-centre and made 25,000 calls to farmers over a period of two weeks. Our very short phone
surveys asked basic questions on last-mile delivery and beneficiary experience with the
program: Have you received your money? When did you receive it? Have you been able to
cash your check?
Importantly, we found (using a randomized control trial) that simply announcing to lower-level
officials that this measurement would happen led to a significant improvement in the quality of
program implementation (measured by total funds delivered, and funds delivered on time). The
intervention was highly cost-effective, and increased the on-time delivery of benefits by over
hundred rupees for every one rupee spent on the call center for a benefit-cost ratio of over
100. Thus, tracking last-mile service delivery and beneficiary experience can sharply improve 19
government employees’ productivity and quality of service delivery.
The biggest attraction of this approach is that it is easy and inexpensive to scale. India is
already a world leader in call-center services, and this expertise can be leveraged to improve
governance at home. On average, a phone survey costs only 5% as much as a field survey. Of
course, the limitation is that phone surveys are much shorter. Still, it is a powerful tool for
senior officials and middle-level managers to get real-time feedback on beneficiary experience
with a wide variety of programs including the PDS, NREGS, and other welfare programs.
I therefore recommend that states invest in phone-based measurement alongside field-based
measurement. States such as Andhra Pradesh and Odisha have already invested in such
outbound call centres, which may provide a template to other states to follow. The planning
department can coordinate the procurement of the call centers as well as ensure adequate
technical support for issues like sampling, analysis, and creation of dashboards, while line
departments can use this infrastructure to improve their day to day functioning.
Data from such high-frequency phone surveys can provide immediate and actionable feedback
to departments on how programs are working on the ground. It can provide a more direct basis
for performance measurement and management of government staff: while outcomes can take
time to improve, staff can be held immediately accountable for program implementation quality.
Short rapid phone surveys can also be used to get citizen inputs into the design of programs
based on their experiences in accessing them. Governments rarely listen to citizens between
elections, and institutionalizing regular phone-based surveys can make citizens active
participants in governance and not just passive recipients of government initiatives.
Such data can also help in protecting the vulnerable from being excluded from public programs
and benefits. For example, in my work on evaluating the impact of using Aadhaar-based
biometric authentication (ABBA) in the delivery of PDS benefits in Jharkhand, my co-authors
and I found that there were significant reductions in leakage. But we also found an increase in
exclusion errors – especially during the transition period to the new system – with nearly two
million beneficiaries losing access to PDS benefits at some point during the transition.20
20 See Muralidharan, Niehaus, and Sukhtankar (2021a) and (2021b) for details.
19 See Muralidharan, Niehaus, Sukhantar, & Weaver (2019).
The policy debate on the desirability of ABBA has often operated at an â€œall or nothingâ€ level
with proponents pointing to reduced leakage to justify it, and opponents pointing to exclusion to
argue for scrapping ABBA altogether. This is counterproductive because at some level both
sides are right – there was a reduction in leakage but also an increase in exclusion. While
reducing exclusion is a moral imperative (especially given that the excluded are often the most
vulnerable), it is also morally important to reduce leakage because the funds thus saved from
corrupt intermediaries expand the fiscal capacity of the state to provide other public goods and
services (that disproportionately benefit the poor).21
Thus, a more effective way forward would be to improve ABBA implementation to get the
benefits of reduced leakage, while also minimizing the risk of exclusion. The problem is that
though officials and courts have been sensitive to the costs of exclusion – they have been
limited in their ability to measure and minimize exclusion in the absence of systematic real-time
data. Having a system of high-frequency phone-based monitoring can directly address this
challenge. Concerns about not being able to reach those without phones (who are likely to be
most vulnerable) can be mitigated by asking respondents if they are aware of cases of people
in their village (or ward) not receiving their benefits.
More generally, collecting and publicly-sharing (anonymized) data from such independent,
random, and representative phone-based surveys can also improve public trust in the
government. One reason for the polarized debate on ABBA is that officials did not trust reports
of exclusion from field-based activists because they believed that the criticism was based on a
few selected cases and â€œmotivatedâ€ by political antipathy towards the government. Conversely,
activists felt that the government was in denial about exclusion and correspondingly pushed for
scrapping ABBA altogether as opposed to finding ways to work on improving it.
The discussion above also highlights that a lot of genuine disagreement regarding policies in
India (and around the world) comes from people arguing from different parts of the distribution
of outcomes. The same reform can have positive impacts on some and negative impacts on
others, and it is impossible to assess the overall impact without understanding the magnitudes
of these effects (and how different groups are affected). One promising way to improve public
discourse and reduce disagreement is to ask: “What data do we need to narrow down the
range of disagreement?” Having regular and transparent visibility into citizen experiences with
public programs and policies can thus play an important role, not just in policy making but in
improving public trust and â€œpublic reasonâ€ that is essential to a well-functioning democracy.
21 See the discussion in Muralidharan (2020) for more details.
2.4 Implement nested supervision to improve administrative data integrity
The final pillar we need to strengthen is the quality and reliability of administrative data
collected by departments themselves. Currently, officials who collect administration or
programmatic data (like learning levels or nutrition outcomes) are tempted to inflate data
because it reflects their performance and they know there is no way to verify their claims. A
simple way to fix this problem is to digitize front-line data collection and recording so that the
same kind of quality control that is implemented with survey enumerators can be implemented
for the government system. The crucial addition required here is to use the digital records to
randomly generate records for field verification both by department supervisors and by
independent authorities (such as the office of a district collector or block development officer).
Using random audits to test the veracity of data that is reported by lower-level officials, it is
possible to generate â€œtruth scoresâ€ for all government employees — measured by the extent of
deviation between the data reported in the official system, and that captured in independent
rechecks by supervisors as well as independent officers. These truth scores can be a key 22
metric on which staff performance is assessed. The crucial point is that with digital data
collection, the person who enters the data is also attesting to its veracity and this is recorded in
the system – which makes it possible for any supervisor to conduct random audits of data
quality. Normally, this supervision and verification will happen within the department itself – by
block or district-level officials. But, it is also possible for a collectorâ€™s office or an independent
department like planning or a department of social audit to periodically audit and verify the data
– which reduces the risk of collusion within a department between staff and supervisors.23
I refer to this approach as â€œnested supervisionâ€ because it effectively accounts for supervisory
capacity constraints. Consider school education, where a cluster resource coordinator (CRC)
is usually in charge of 20-40 schools, a block education officer (BEO) is in charge of 200-300
schools, and a district education officer (DEO) is in charge of 1500-2500 schools. While it is
not possible for a DEO to visit every cluster (let alone school) in a year, it is possible for a CRC
to visit every school in her area at least once in 3 months. Thus, every school can expect at
least 3 random visits a year by the CRC where the integrity of recorded student learning will be
checked; some schools in every CRC can be visited by a BEO; and some schools in every
block can be visited by a DEO. This allows a DEO to generate truth scores for a BEO, a BEO
to do for CRCs, and CRCs to do so for headteachers, and to thereby incentivize the entire
system to tell the truth because any case of a school found to be fudging data will reflect
negatively not only on the headteacher, but also on the concerned CRC, BEO, and DEO.
23 The importance of sample-based field surveys to validate administrative data has also been noted by Dr P C
Mohanan, former acting Chairman of Indiaâ€™s National Statistical Commission (see Mohanan 2019).
22 I thank Dr. Santhosh Mathew for coining the term â€œtruth scoreâ€ in our conversations on this topic.
This approach can be deployed across departments ranging from education, ICDS, health,
agriculture, and rural/urban development. Importantly, the approach should be limited to the
few most critical indicators for each department such as learning outcomes (education), child
malnutrition (ICDS), vaccinations (health), crop yields (agriculture), and key measures of local
service delivery such as clean streets and garbage collection, working streetlights, and running
water (for rural and urban development departments). The planning department can provide
technical support in designing software systems for nested supervision including data capture,
reporting, dashboards, and calculation of truth scores. The field staff of the planning
department (supervised directly by the Collectorâ€™s office) can also take charge of random
independent audits of the data reported by various departments to provide an additional layer
of accountability for reporting the truth for district-level officials.
Improving administrative data integrity through a system of nested supervision and calculation
of â€œtruth scoresâ€ can bring about a profound change in the culture of government by making it
harder to deny reality. While ignorance may be bliss for those who wish to avoid accountability,
achieving systematic improvements in development outcomes at scale has to be built on a
foundation of â€œrealityâ€. Forcing the system to report and acknowledge reality will be a key first
step in focusing the energies of government employees on solving problems as opposed to
hiding them by fudging data. It will also make it possible to recognize and reward
high-performing employees on the basis of actual improvements in outcomes (see Ch. 5).
2.5 Institutional safeguards for data quality, privacy, and transparency
Building a new technology-enabled measurement architecture for India presents exciting
possibilities for improving governance and service delivery. However, even as governments
must act urgently to build measurement systems, they must proceed with care. As with any
data collection process, there is a tension between transparency and privacy. In India,
government actions over the years have often prioritized transparency. Information regarding
beneficiaries of welfare programs such as NREGS and PDS is shared publicly online, often
including personal details. With the Supreme Court declaring privacy as a fundamental right,
India will have to shift the balance towards privacy and confidentiality.
Thus, it will be critical that data collected from individuals is not traceable back to them but is
only available at a level of aggregation relevant for better governance and management (e.g.,
the district or block level). To ensure this happens, all protocols to de-identify data, which are
now commonplace in research, should be followed. Such anonymization will also facilitate 24
24 One way to ensure this is to design data capture systems so that the database with citizen responses does not
include any identifying information from whom the information was obtained – including their phone number! In
whistleblower protection. A citizen complaining about a government service should not fear
retribution from local officials.
This is a point that government officials often do not appreciate enough. In my work supporting
state governments with phone-based monitoring of citizen experiences with receiving benefits,
officials in multiple states have often wanted to know the identity of people who say they are
having problems so that they can address these issues immediately. However, there is a key
difference between grievance redressal where citizens themselves call a helpline to ask for
help (and hence consent to their identity being known), and a measurement system for
obtaining citizen-level feedback on government programs and policies at a systemic level.
Thus, the right way to respond in cases where respondents to an outbound call report
problems is to provide them with a number they can call for help (if they want to do so).
However, their identities should remain concealed.
While it is essential to protect individual-level information, aggregated summary statistics of
public interest, including district- and block-level averages and distributions, should be
released to the public as soon as feasible. Indeed, a central channel through which better
measurement will improve outcomes is through increased accountability. This includes both
â€œtop downâ€ administrative accountability and â€œbottom upâ€ public accountability.
Releasing summary statistics can improve public accountability and also strengthen the
effectiveness of democracy by providing political incentives to focus on the things that actually
matter. As discussed in Chapters 2 and 3, the Indian state will work better for citizens if the 25
private incentives of politicians and officials are better aligned with the public interest. A
transparent, frequent, and disaggregated public sharing of key data on outcomes and
government program functioning can contribute to this goal by making the â€œunseenâ€ aspects of
governance (outcomes and citizen experiences) become more â€œseenâ€ or visible, and thereby
increase their political salience. One way to institutionalize such data sharing may be to 26
26 I thank Amit Varma for his podcast titled â€œThe Seen and the Unseenâ€ for this connection, which I made while
appearing on his show to discuss Indian education (in Episode 185).
25 Public disclosure of data need not prevent government departments from querying data or pressure testing it
before it is finalized. A good balance will be for preliminary reports to first be shared internally within the
government for review, followed by a public release of summary statistics at the district or block level.
cases where there is a need to survey the same household repeatedly, it is essential to only use identifiable
information for surveys and have strict internal norms for protecting identifiable information. These are well-known
challenges for data collection everywhere, and the accepted protocols for protecting respondent identity in repeat
surveys must now be applied in India. Similar best practices need to be followed with regard to data security and
access to ensure data is separated from identifying information before being used for analysis. The Panel Study
of Income Dynamics (PSID) and National Longitudinal Survey of Youth (NLSY) in the US, the Young Lives study
across multiple countries (including India), and the Indian Human Development Survey (IHDS) are all examples of
successful longitudinal data collection efforts that have effectively protected identifiable information.
include key statistics from such surveys in the annual state economic surveys that many state
governments present in the legislative assembly along with the annual budget.
Publicly disclosing results signals that the government is acknowledging reality. It sets the
stage for focusing on improvements rather than assigning blame for poor initial outcomes. It
can help justify resource allocation on the basis of both need and performance (Ch. 6).
Releasing data on district (and block) level improvements can also stimulate a broader public
engagement with the effectiveness of policies. It can improve political rewards for leaders who
can deliver meaningful improvements and accountability for those who do not.
Of course, data is inherently political and there will likely be political pressure from incumbent
governments to highlight favorable data and conceal unfavorable ones. One way to mitigate
this concern may be for states to constitute their own statistical commissions with technical
experts as members to review the methods of data collection, and sign off on the annual
publicly released data. The commission could also be responsible for designing and ensuring
adherence to data and privacy safeguards. It may also make sense for the Chief Minister and
Leader of Opposition to both be ex officio members of the commission to depoliticize the data
itself. Political discourse can and should focus on the extent of improvement (or lack thereof) in
key development indicators (both on average, and for different groups), but the data itself
should not be politicized to the extent possible.
Institutional innovations like setting up a state-level statistical commission with both technical
experts and bipartisan political representation can not only improve the quality and use of data,
but also make our democracy itself work better. As I discuss further in Chapter 18, we have
mostly relied on the union government to design and build institutions for India. But there is no
reason that states cannot lead the way in the institutional rejuvenation we need after 75 years
Over the long arc of human history, better measurement has been the foundation of better
management and improved productivity. The importance of data and analytics in driving 27
productivity is seen by the enormous resources that the most successful private-sector firms
such as Google and Amazon put into this area to improve their performance. In the Indian
27 See Landes (2000) for a rich discussion in the context of the history of clocks and the centrality of better
measurement of time for improvements in productivity. See Baker et al (2004) for a more recent illustration of
how the availability of on-board GPS devices helped change contractual structure and productivity in the
long-distance trucking industry.
context, this point is seen in a well-known quote from Infosys founder Narayanamurthy who is
noted for saying that: â€œIn God we trust, everyone else must bring dataâ€!
Better data is similarly the foundation for improving the functioning of the Indian state. The
â€œthemesâ€ section of this book correspondingly starts with a focus on improving state capacity to
measure outcomes and processes as the first step towards fixing the Indian state. Over the
years, I have discussed many of the ideas in the later chapters of this book with very senior
government officials and received the response that: â€œthese are great ideas, but we do not
have the data to act on them.â€ This chapter is therefore a foundational one because many of
the ideas in later chapters depend on the implementation of the ideas in this chapter and the
existence of the right data to inform improvements in all other aspects of state functioning.
This chapter demonstrates that it is feasible to set up a revamped measurement architecture
whereby states can collect high-quality data on both outcomes and processes at large scale,
rapid speed, and high-levels of spatial disaggregation. Crucially, the costs of doing so are low.
The entire vision outlined in this chapter can be implemented at a cost of under 0.1% of state
budgets, and will dramatically improve the effectiveness of the remaining 99.9% of the budget.
Politicians already understand the importance of using data in planning election campaigns. It
is time to bring the same approach to governance itself. Importantly, doing so is likely to have
political payoffs as well. A state that embarks on this vision can have a first round of data
collection completed within twelve months. Done at the beginning of a five-year term, such an
exercise can drive a step-function improvement in state performance during an elected Chief
Ministerâ€™s term, and deliver visible improvements in governance in time to take back to voters
at the end of a five-year term. Beyond re-election, Chief Ministers who successfully embark on
such a journey are likely to leave a lasting legacy for their states and people.
ASER. (2018). ASER 2018 (Rural) findings. Retrieved from
Chaudhury, N., Hammer, J., Kremer, M., Muralidharan, K., & Rogers, F. H. (2006). Missing in
Action: Teacher and Health Worker Absence in Developing Countries. . Journal of
Economic Perspectives, 20(1), 91-116.
Kremer, M., Muralidharan, K., Chaudhury, N., Rogers, F. H., & Hammer, J. (2005). Teacher
Absence in India: A Snapshot. Journal of the European Economic Association, 3(2-3),
Mohanan, P. C. (2019, May 25). Sample surveys are important to validate administrative
databases. The Indian Express.
Muralidharan, K. (2011, December 12). India’s States Can be Laboratories for Policy
Innovation. Business Standard.
Muralidharan, K. (2013). Priorities for Primary Education Policy in Indiaâ€™s 12th Five-year Plan.
India Policy Forum, 9, 1-46.
Muralidharan, K., Das, J., Holla, A., & Mohpal, A. (2017). The fiscal cost of weak governance:
Evidence from teacher absence in India. Journal of Public Economics, 145, 116-135.
Muralidharan, K., Niehaus, P., & Sukhtankar, S. (2017). Direct Benefits Transfer in Food:
Results from One Year of Process Monitoring in Union Territories. UC San Diego.
Muralidharan, K., Niehaus, P., Sukhtankar, S., & Weaver, J. (2019). Improving Last-Mile
Service Delivery using Phone-Based Monitoring. National Bureau of Economic
Research Working Paper Series, No. 25298. doi:10.3386/w25298
Muralidharan, K., & Singh, A. (2019). Improving Public Sector Management at Scale?
Experimental Evidence on School Governance in India. Unpublished Draft.
Shariff, A., & Saifullah, K. (2018). Electoral Exclusion of Muslims Continues to Plague Indian
Democracy. Economic and Political Weekly, 53(20).
World Bank Group. Living Standards Measurement Study. Retrieved from
Below are added by SP
Bhattacharya, P. (2019). How Indiaâ€™s statistical system was crippled. The Mint.
S, R. (2014). Question mark over data on learning. The Hindu.
Muralidharan, K., P. Niehaus and S. Sukhtankar (2021). Integrating Biometric Authentication in
India’s Welfare Programs: Lessons from a Decade of Reforms. India Policy Forum, NCAER.
Muralidharan, K., P. Niehaus and S. Sukhtankar (2021). “Identity verification standards in
welfare programs: experimental evidence from India. ” NBER.
Muralidharan, K. (2020). â€˜To an extent, both supporters and critics of Aadhaar for service
delivery are correctâ€™. U. Misra, Indian Express
Chapter 4: Data and Outcome Measurement1