Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.

Research Article

Evidence Points To ‘Gaming’ At Hospitals Subject To National Health Service Cleanliness Inspections

  1. Veronica Toffolutti ( [email protected] ) is a postdoctoral researcher in health economics at the University of Oxford, in the United Kingdom.
  2. Martin McKee is a professor of European public health at the London School of Hygiene and Tropical Medicine, in the United Kingdom.
  3. David Stuckler is a professor of political economy and sociology at the University of Oxford.
PUBLISHED:Free Accesshttps://doi.org/10.1377/hlthaff.2016.1217


Inspections are a key way to monitor and ensure quality of care and maintain high standards in the National Health Service (NHS) in England. Yet there is a perception that inspections can be gamed. This can happen, for example, when staff members know that an inspection will soon take place. Using data for 205 NHS hospitals for the period 2011–14, we tested whether patients’ perceptions of cleanliness increased during periods when inspections occurred. Our results show that during the period within two months of an inspection, there was a significant elevation (2.5–11.0 percentage points) in the share of patients who reported “excellent” cleanliness. This association was consistent even after adjustment for secular time trends. The association was concentrated in hospitals that outsourced cleaning services and was not detected in those that used NHS cleaning services.


A prerequisite for a competitive market in health care, such as that established by the English Health and Social Care Act of 2012, is the existence of valid information on the performance of providers. This is necessary for informed purchasing of services. Yet as has often been noted, this can be difficult because, other than for certain easily standardized services, many aspects of health care are difficult to specify, 1 and there are strong incentives for opportunistic behavior, or gaming. 2 This can take many forms, such as changing behavior (for example, by avoiding complex cases) or changing how things are recorded (for example, by adding diagnostic codes to make patients appear more severely ill than they are). 3

As noted on several occasions by the UK House of Commons Public Accounts Committee, 4 one area of concern relates to external inspections of providers—such as those undertaken by the Care Quality Commission, one of a number of regulators in the National Health Service (NHS) in England. These concerns are echoed in the field of education, which has also seen a marked increase in inspections and where there have been many accounts of opportunistic behavior, 5 such as schools being warned about “unplanned” inspections or the temporary exclusion of disruptive students or those of low ability from testing 68 —or even changing the food provided in school cafeterias with the dubious intention of boosting students’ performance 9 (with questionable impacts on their health). 10,11

Hospital cleanliness has been high on the agenda of successive governments in the United Kingdom, reflecting a combination of appropriate concern about hospital-acquired infection and the exploitation of data by some media outlets. 12 Even though the media coverage of hospital cleanliness problems has diminished in intensity, it has not stopped. 1317

Consequently, the NHS’s ten-year plan, launched in 2000, 18 established a series of “nation-wide clean-up campaigns” to improve cleanliness in hospitals. These involved “unannounced” inspections (although staff members were always given forty-eight hours’ notice) that would take place over the course of up to one month by teams composed initially of hospital staff members and patients, chosen at the level of the trust (trusts are English public organizations that operate one or more health care providers, including hospitals). However, a lack of patient volunteers meant that the teams subsequently included mostly staff members.

From the outset, there has been concern about the potential for gaming of cleanliness inspections. It is widely believed that since staff members know when each inspection will happen, they are incentivized to make a special effort in the period leading up to it and then relax their standards after the inspection. This could be especially prominent in services that are outsourced to private contractors, given the risk of failing to obtain contract renewal should their performance receive poor scores in NHS inspections.

The true extent and consequences of gaming in the NHS are poorly understood, but there is enough evidence to raise concerns. Russell Mannion and Jeffrey Braithwaite found twenty distinct forms of dysfunctional responses to the NHS performance management regime. 19 Gwyn Bevan and Christopher Hood give examples of poor performance in areas that are not measured, hitting the target but missing the point, and ambiguities or fabrication of data. 20 Another review highlighted various abuses of health targets, including the creation of target-free zones, either physically (for example, placing patients awaiting hospital admission in temporary facilities in the hospital’s parking lot) or administratively (for example, establishing informal waiting lists to get on official waiting lists), and exploiting the opportunity to remove patients from waiting lists, if they declined an offer of admission, by making offers during holiday periods. 21 In addition, two studies found that financial incentives to physicians increased the likelihood that they would manipulate lists of patients by excluding those whose presence impeded their achievement of targets. 22,23

In these circumstances, it seems plausible that hospitals have incentives to game cleanliness inspections. Information we obtained from two acute trusts under freedom-of-information legislation revealed that they actually had between two and five months’ advance notice of such inspections.

In what we believe is the first study of its kind, we looked for evidence of possible gaming effects by taking advantage of a unique source of data that links patients’ perceptions of cleanliness with hospital inspection dates in the period 2011–14. Specifically, we tested whether patients gave hospitals’ cleanliness better ratings in the months leading up to an inspection than at other times, which would be consistent with the hypothesis that gaming does take place.

Study Data And Methods

We linked data on patients’ perceptions of cleanliness with dates of cleaning inspections for 205 English hospitals. All analyses were conducted at the hospital level. Patients’ assessments of hospital cleanliness were obtained from the Picker Institute NHS Patient Survey Programme. 24 Between June and August each year, each trust sends a questionnaire to 850 patients who have spent at least one night in a hospital operated by the trust. They are asked to report on their experiences at any time in the year, although in practice 93 percent of the reports describe experiences in this three-month period. All of the sampled patients are asked, “In your opinion, how clean was the hospital room or ward that you were in?” The possible answers are “very clean (excellent),” “fairly clean,” “not very clean,” and “not clean at all.”

We recoded the data by hospital and matched this information with the month that the hospital had a cleanliness inspection—data obtained from Patient Environment Action Teams for 2011–12 25 and Patient-Led Assessments of the Care Environment for 2013–14) 26 (the name of the data source changed, but the data collection practices did not). We aggregated these data to determine the median percentage of patients rating cleanliness as “excellent” for each hospital by month and year. Additional data on hospital size and services provided were taken from the Estates Return Information Collection (a mandatory information collection from all NHS trusts) for the period 2011–14. 27

We matched data on the timing of cleanliness inspections and from the NHS Patient Surveys by calendar year. The data on hospital size were reported by fiscal year; we matched these data to calendar years. This was unlikely to confound the analysis since there is little temporal variation in numbers of hospital beds.

Our initial sample included 492 English hospitals. Seventeen (3.46 percent) were excluded because they had no inpatient services. Another 270 (54.9 percent) were excluded because patients had not been surveyed. Thus, the final sample consisted of 205 English hospitals. We observed 145 hospitals each year, on average, for 6–7 months, and we had complete records for 907 hospital-months. Of these hospitals, 125 operated in-house NHS cleaning services, 76 hospitals contracted with private providers of cleaning services, and 4 used both NHS and private providers (these hospitals integrated outsourcing into a mixed public-private partnership). This information is displayed in a flow chart in online Appendix 1, Exhibit A1. 28Exhibit 1 provides further information about the 205 hospitals.

Exhibit 1 Selected characteristics of 205 UK hospitals, 2011–14

CharacteristicMedian or meanSDMinimumMaximum
Median percent of patients rating cleanliness “excellent”72.111.425100
Number of beds63749352,257
Average length-of-stay (days)6.071.562.414.2
Multiservice hospitals a0.080.2701
Specialist hospitals a0.200.4001
Other hospitals0.720.4501
North of England a0.440.5001
Central England a0.270.4401
London a0.110.3201
South of England a0.180.3901
Number of hospitals observed for each month of inspection14549.91194
Number of patients without missing data on hospital cleanliness survey per month2051501552

SOURCE Authors’ analysis of merged data at the hospital level from the following sources: the Patient Environment Action Teams data set for 2011–12 (see Note  25 in text), the Patient-Led Assessments of the Care Environment data set for 2013–14 (see Note  26 in text), the Estates Return Information Collection data set for 2011–14 (see Note  27 in text), and the National Health Service Inpatient Survey for 2011–14 (see Note  24 in text). NOTES For every category except “Number of beds,” where the sample size was 913 hospital months, the size was 924 hospital-months. Cleanliness refers to the room or ward where each patient stayed. SD is standard deviation.

aDummy variable.

Statistical Models

To investigate the association between month of inspection and perceived cleanliness, we used a regression discontinuity design. 29 (For further details, see Appendix 1, Exhibit A2.) 28

As shown in Appendix 1, Exhibit A3, 28 until 2012, the assessments tended to be concentrated between January and March, whereas after 2012 they tended to occur in the first six months of the year. The main coefficient of interest was β, which estimated the average change in the median perceived cleanliness of hospitals during inspection months. All data and models were estimated using Stata, version 13. Robust standard errors were clustered by hospital to reflect the nonindependence of sampling.


As with all statistical modeling studies, our analysis had several limitations. First, we did not have the exact date when a patient was discharged, only the month. Thus, when we merged information at the hospital level, we could not investigate a possible gaming effect within a given month. This imprecision was likely to have produced conservative estimates of the magnitude of potential gaming behavior.

Second, our results suggest only modest effect sizes. However, even a modest increment in perceived cleanliness is sufficient for hospitals to avoid threats of an adverse assessment and the consequences that flow from it.

Third, a comprehensive longitudinal data set that tracks patients’ perceptions of cleanliness independently across all of the sites in the United Kingdom does not exist. Thus, in this initial assessment—to our knowledge, the first of its kind in the NHS—we took advantage of a large pooled data set to determine whether cleanliness increased in the months just before and during inspections and then reverted to its historical level after inspections. A limitation to this method is that it cannot identify individual hospitals that are gaming. However, it does point to characteristics, such as outsourcing cleaning services, that may render a hospital more likely to game.

Fourth, we could not observe a uniform distribution in terms of month of inspection or in terms of the numbers of patients responding to the questionnaire. However, we took advantage of the available data to assess gaming effects.

Study Results

Association Of Inspection Months With Cleanliness

In the months leading up to an inspection, levels of cleanliness appeared to rise, followed by a drop after the inspection period ( Exhibit 2 ). When we compared the months before and after the inspection, we found that on average, patients’ reports of excellent cleanliness were about 10 percentage points higher (81.5 percent, versus 71.9 percent in all other months; t -test: −3.73; p0.001 ).

Exhibit 2 Median percentages of patients at 205 UK hospitals who rated cleanliness as “excellent” in 2011–14, by proximity to the month of cleanliness inspection

Exhibit 2
SOURCE Authors’ analysis of merged data at the hospital level from the following sources: Patient Environment Action Teams data set for 2011–12 (see Note  25 in text), Patient-Led Assessments of the Care Environment data set for 2013–14 (see Note  26 in text), Estates Return Information Collection data set for 2011–14 (see Note  27 in text), and National Health Service Inpatient Survey for 2011–14 (see Note  24 in text). NOTES Cleanliness refers to the room or ward where each patient stayed. In this analysis, the numbers of hospitals in the study period were as follows: −5 months: 1; −4 months, 5; −3 months, 4, −2 months, 5; −1 month, 9, 0 months, 18; 1 month, 59; 2 months, 131; 3 months, 135, 4 months, 156, and 5 months, 194. The numbers of patients who responded to the questionnaire were as follows: −5 months, 14; −4 months, 112; −3 months, 29, −2 months, 30.6; −1 month, 42; 0 months, 113; 1 month, 152; 2 months, 189; 3 months, 214; 4 months, 208; and 5 months, 231. The inspection corresponds to the point 0.

For example, at the Royal National Hospital for Rheumatic Diseases, there were inspections in June 2013 and May 2014. In the months before each inspection, patients’ perceptions of cleanliness were relatively constant ( Exhibit 3 ). Those perceptions increased in the inspection months and returned to their previous levels shortly afterward.

Exhibit 3 Median percentages of patients at the Royal National Hospital for Rheumatic Diseases who rated cleanliness as “excellent” in 2013–14, by proximity to the month of cleanliness inspection

Exhibit 3
SOURCE Authors’ analysis of merged data for 2013–14 from the Patient-Led Assessments of the Care Environment data set for 2013 (see Note  26 in text) and National Health Service Inpatient Survey(see Note  24 in text). NOTES Cleanliness refers to that of the room or ward where each patient stayed. In this analysis, the numbers of patients who responded to the questionnaire in 2013 were as follows: Jan, 14; Feb, 18; Mar, 28; Apr, 16; May, 22; Jun, 20; Jul, 17; and Aug, 29. The numbers of patients who responded in 2014 were as follows: Jan, 15; Feb, 19; Mar, 10; Apr, 11; May, 23; Jun, 12; Jul, 18; and Aug, 23.

Our data were corroborated by other evidence. The results of our freedom-of-information requests to hospitals about communication with cleaning staff in the months during inspections revealed that those hospitals performed a series of detailed pre-inspection checks a few days before inspection, which revealed long-standing problems that were then addressed. (For an example, see Appendix 1, Exhibit A4.) 28

In the month when an inspection took place, the share of patients who rated their hospital’s cleanliness as “excellent” jumped by 7.78 percentage points (95% confidence interval: 2.75, 12.8; see Appendix 1, Exhibit A5). 28 (For further corroboration of our results, see the estimation coefficients of a distributed lag model in Appendix 1, Exhibit A6.) 28

Outsourced Or In-House Cleaning Services

We used a difference-in-differences model to test whether hospitals that privately contracted for cleaning services were more likely to exhibit gaming behavior, compared to those that provide the cleaning in-house. 30 As shown in Appendix 1, Exhibit A5, higher cleanliness scores in inspection months were concentrated in hospitals that outsourced their cleaning services (11.0 percentage points; 95% CI: 5.15, 19.6), whereas there was no statistically detectable association between cleanliness scores and inspection months in hospitals that used in-house NHS cleaning services (2.68 percentage points; 95 percent CI: −3.52, 8.88). (For further corroboration of our results, see the estimation coefficients of a distributed lag model in Appendix 1, Exhibit A7.) 28 This finding is in line with a recent study that found a greater incidence of infection and evidence of poorer cleaning where cleaning was outsourced. 31

Within-Group Estimation

To test whether our results were driven by potential unobserved heterogeneity, we used a within-group estimation. Our results clearly show that switching from a non-inspection month to an inspection month led to an increase in reported cleanliness by about 2.54 percentage points (95% CI: 0.02, 5.06; see Appendix 1, Exhibit A5). 28

To further the temporal pattern of our results, we included a cubic term in the term “time to inspection.” The results were consistent with our main study findings (β: 2.86 percentage points; 95% CI: 0.06, 5.67).

Robustness Checks

We performed a series of robustness checks in order to better understand the effects of various factors on our results—that is, whether something other than an impending inspection could be driving the changes we observed. First, we adjusted for potential confounding factors, including hospital size, hospital complexity (that is, whether the hospital type was specialist, multiservice, or other), and time trends. (For the results of these checks, see Appendix 1, Exhibit A8.) 28

To identify whether these patterns were driven by a few outliers that exhibited extreme gaming, we removed 5 percent of our distribution (2.5 percent each from the bottom and the top of the distribution). This changed none of our results (see Appendix 1, Exhibit A9). 28

We further examined our results to see whether they were confounded by some areas’ having low numbers of respondents. We restricted our sample to areas with at least three hospitals that each had at least seventeen respondents. This removed 10 percent of the lower end of the distribution in terms of numbers of respondent patients for each month (for the results, see Appendix 1, Exhibit A10). 28 The results were consistent with our main findings, except that the within-group results were no longer significant.

To ensure that our results were not driven by structural differences between acute and specialist hospitals that may have affected the propensity to fall into the “treatment” group, we applied two different robustness tests (“treatment” here is defined as hospitals that were observed during inspection months or shortly before; hospitals observed at other times were placed in the “control” group—for the results, see Appendix 1, Exhibit A11.) 28 First, we used propensity score matching to better match treatment hospitals with control hospitals on their size and complexity. Specifically, we stratified hospitals by type (specialist, multiservice, or acute), and within each category, we matched treated hospitals with at least one control in terms of hospital size (we allowed control hospitals to be used more than once as a match). To ensure “goodness of fit,” we permitted matches for only those pairs of hospitals whose propensity scores differed by 0.01 or less. Second, we restricted our sample to specialist hospitals. In both cases, none of our results changed qualitatively.

Finally, as a so-called falsification test, we analyzed the pattern of responses to food and hydration quality of hospitals, instead of cleanliness. Conceptually, cleaning and providing food are different services. The companies providing these services are also different, if the services are outsourced. This allowed us to test whether the problem is cleaning in particular, instead of a general disposition to outsource services. (For the results, see Appendix 1, Exhibit A12.) 28 It is worth noting that the sample size dropped because the assessment of food quality is available only at the trust level. However, this was a good test on conceptual and empirical grounds. Empirically, we observed no significant correlation at the trust level between scores of cleanliness and of food and hydration quality (ρ = 0.11).


NHS inspections are a core element of the performance management regime designed to ensure that hospitals maintain high standards of quality. This is especially important when services, including cleaning, are outsourced to private contractors to save money. Yet there is a perception that NHS inspections can be gamed. This can happen, for example, when staff members know that an inspection will soon take place.

By taking advantage of a unique data source, we were able to compare patients’ perceptions of cleanliness around the time of inspections. We found evidence consistent with gaming: In inspection months and for a short period before them, cleanliness appeared to improve, declining in subsequent months. This pattern was most prominent for hospitals that outsourced cleaning services to private contractors. This finding appears particularly relevant in light of a recent study that found that sites that outsourced cleaning services had significantly higher rates of methicillin-resistant Staphylococcus aureus (MRSA). 31

Our findings suggest that gaming may increase a hospital’s cleanliness score by 2.5–11.0 percentage points. This would often be sufficient to avoid the severe consequences of an adverse inspection report, which range from warnings to enforcement action by the Care Quality Commission or even restrictions on activity, and which have implications for the tenure of senior executives.

Our findings have obvious implications for policy, given the importance of hospital cleanliness in the fight against antimicrobial resistance. However, they also have implications for systems of regulation and inspection. One obvious question is whether inspections should be announced or unannounced. For example, our findings suggest that hospitals invested considerable resources in preparing for an inspection. Arguably, they should be investing those resources at all times. A recent systematic review asking whether announced and unannounced inspections led to different approaches to risk assessment found only three studies. 32 The authors concluded that unannounced inspections reduce the regulatory burden compared to announced ones, but there was no significant difference between the two in terms of outcomes.

Another question is the extent to which a system based on inspections is the best way of ensuring quality. A history of regulation in the English NHS described a series of shifts from trust-based professional regulation to detailed external inspection, followed by some rolling back. 33 Changes were often driven by events that revealed malfunctions in the system in place at the time, instead of making reforms based on evidence of the clear superiority of one among a series of alternative approaches.

While the characteristics of an ideal system are easy to specify, combining high standards with transparency, they seem more difficult to achieve in practice. However, one lesson is clear. In any regulatory system, it should be assumed that gaming will take place. These systems should be designed in ways that minimize gaming.


David Stuckler and Veronica Toffolutti are funded by the European Research Council (Grant No. 313590-HRES). Stuckler is also funded by the Wellcome Trust.


Loading Comments...