The CMS Star Ratings may be of limited value for patients choosing hospitals for specific care needs.
Objectives: To examine characteristics of the CMS Overall Hospital Quality Star Ratings related to their use by consumers for choosing hospitals.
Study Design: Observational study using secondary data analyses.
Methods: Hospital Star Rating data reported in February 2019 and additional quality data from California and New York were used, with a mix of analytical approaches including descriptive statistics, correlational analysis, and Poisson regression models.
Results: The distribution of hospitals’ Star Rating summary scores was tightly compressed, with no hospitals at or near the scores that would be obtained if a hospital were either best or worst across all quality domains. Hospitals did not consistently perform well or poorly across the range of measures and quality groups included in the Star Ratings. On average, for a given quality measure included in the Star Rating program, 12% of 1-star hospitals received top-quartile scores and 16% of 5-star hospitals received bottom-quartile scores. No significant associations were found between hospitals’ overall Star Ratings and their performance on a set of condition-specific quality measures for hospitals in California and New York State.
Conclusions: Hospitals’ overall scores clustered in the middle of the potential distribution of scores; no hospitals were either best at everything or worst at everything. The Star Ratings did not predict hospital quality scores for separate quality measures related to specific medical conditions or health care needs. These 2 observations suggest that the Star Ratings are of limited value to consumers choosing hospitals for specific care needs.
Am J Manag Care. 2021;27(5):203-210. https://doi.org/10.37765/ajmc.2021.88634
The CMS Star Ratings were created to provide a simple guide to patients or consumers for choices of hospitals. For patients making a choice for a specific medical condition or health care need:
The CMS Overall Hospital Quality Star Ratings1 (referred to subsequently as the “Star Ratings”) were created to provide consumers with a simple, easy-to-understand summary of the measures available in the Hospital Compare measure set.2 Under the current method, a set of 57 individual measures of quality are categorized into 7 quality groups or domains. Scores are calculated using a statistical model for each of the 7 groups and are weighted and combined to yield an overall hospital rating of 1 to 5 stars. Although concerns have been raised about technical aspects of the Star Ratings3 and the accuracy and fairness of the ratings,4 a variety of consumer advocacy and large purchaser groups have reported that they find the Star Ratings useful.5
An overall hospital rating or ranking system could be used in either of 2 ways. It can be used by purchasers or health plans that are designing limited provider networks and whose interest in hospital quality spans the entire range of clinical specialties and programs. An overall rating that brings together scores on multiple individual measures of quality simplifies the decision about inclusion in the network, and it allows a plan or purchaser or patient advocacy group to make recommendations to its members or clients about “good” vs “bad” hospitals for purposes of their choices.6
An overall rating system could also be used by individual patients or consumers for choices of hospitals for specific needs—elective surgical procedures being a common example. In this case, the value of any overall rating system can be questioned because it inevitably includes information on aspects of care that are not relevant to the patient’s specific condition. Available data on the small amount of correlation among specific Hospital Compare measures7 suggest that an overall rating may not be informative about relative hospital performance for specific conditions or procedures.
Existing literature examining associations between Star Ratings and outcomes of cancer surgery,8-10 advanced laparoscopic abdominal surgery,11 and percutaneous coronary intervention (PCI)12 demonstrates mixed results. For example, Star Ratings seemed to be significantly associated with cancer surgery outcomes,8-10 although the efficiency, effectiveness, and feasibility of patients’ use of Star Ratings to identify high-quality hospitals for complex cancer surgeries was questioned.10 On the other hand, no significant associations were found between Star Ratings and mortality in laparoscopic abdominal surgeries, serious morbidity for bariatric or hiatal hernia surgeries, or 30-day risk-standardized mortality after PCI. However, serious morbidity for colorectal surgery and 30-day post-PCI readmission rate were lower at high-star hospitals compared with low-star hospitals.11,12
We attempted to address 3 questions:
We used data from the CMS Overall Hospital Quality Star Rating Program and additional quality data from the states of California and New York. We ran the publicly available SAS Pack (SAS Institute) for the February 2019 Star Rating report period to obtain hospitals’ overall Star Ratings, scores on 7 quality groups (Mortality, Readmission, Safety of Care, Patient Experience, Effectiveness of Care, Timeliness of Care, and Efficient Use of Medical Imaging), and scores on 57 individual quality measures.13 (Quality groups, quality measure identifiers, and quality measure names can be found in eAppendix A [eAppendices available at ajmc.com].)
Additional data in the quality domains of mortality and surgical site infection (SSI) were obtained from California and New York. We selected quality measures that were not included in the February 2019 Star Ratings. For hospitals in California, mortality rates for 13 medical conditions and procedures were obtained from California’s Office of Statewide Health Planning and Development14; SSIs for 17 individual operative procedures as well as SSIs for all types of procedures were obtained from the California Health and Human Services Open Data Portal.15 For the state of New York, mortality rates for 7 medical conditions and procedures, SSIs for 3 types of operative procedures, and 4 carbapenem-resistant Enterobacteriaceae (CRE)–related measures were obtained from Health Data NY, a State of New York Open Data Portal.16 All mortality and SSI rates were risk adjusted. (See eAppendix B for details of the state-level quality measures.)
To answer the initial research question, we first created a distribution of hospitals’ summary scores, which were the sum of the weighted group scores for each hospital. We included hospitals with group scores for all 7 quality groups. We also calculated 2 “benchmark” summary scores using the lowest and the highest group scores for each quality group, respectively, to represent a “potential lowest” summary score and a “potential highest” summary score that a hypothetical hospital could have obtained, had it performed the worst or the best on all quality groups. For example, if a hospital had the best score in all 7 groups, it would receive the high benchmark score. On the other hand, if a hospital had the worst score in all 7 groups, it would receive the low benchmark score. The best or worst measure group score was the best or worst group score actually obtained by a real hospital—not a hypothetical best or worst score based on performance on individual measures within groups.
To address the second question, we first calculated percentages of hospitals receiving top-quartile and bottom-quartile scores on 7 quality groups, by their Star Rating. We then calculated percentages of hospitals receiving top-quartile and bottom-quartile scores on 50 quality measures, by Star Rating. Seven measures that could not be categorized into quartiles due to their distributions (HAI-4, OP-4, OP-30, OP-33, OP-11, PC-01, and VTE-6, defined in eAppendix A) were excluded. Differences between the 5 Star Rating groups were compared using the χ² test.
To address the third question, we examined the relationship between hospitals’ Star Ratings and their performance on individual quality measures, not in Hospital Compare and not included in the Star Rating system, from New York and California. Any hospitals without a Star Rating or a quality measure value were excluded.
We took 3 analytical approaches: First, a correlational analysis was conducted using the Spearman rank correlations between overall Star Ratings and individual quality measures. Second, we compared mean quality measure rates for 3 groups of hospitals, categorized by their Star Ratings: hospitals receiving 1 or 2 stars, hospitals receiving 3 stars, and hospitals receiving 4 or 5 stars. Differences between the 3 resulting groups were tested using the Kruskal-Wallis test. We reported the raw P values and P values adjusted for multiple comparisons using the Sidak-Holm method. Third, to take into consideration the different case volumes for each quality measure at different hospitals, we ran Poisson regression models with hospitals’ Star Ratings (1-5) as the dependent variable and hospitals’ quality measure rates and case volumes as independent variables.
All data management and statistical analyses were performed using SAS version 9.4 (SAS Institute) and Stata/SE version 13 (StataCorp).
Numbers of included hospitals in different parts of the analyses are presented in eAppendix C, with a range of 467 to 4408 hospitals in national-level analyses and a range of 20 to 313 hospitals in state-level analyses. We included both acute care and critical access care hospitals in all analyses, although additional analyses including only acute care hospitals yielded almost identical results.
Hospitals’ summary score distribution is presented in Figure 1. The score that would be obtained by a hospital with the actual observed worst score on every quality group is identified as “potential lowest” at –3.50. The score that would be obtained by a hospital with the actual observed best score on every quality group is identified as “potential highest” at 3.01.
The range of actual scores was much narrower, with the vast majority of hospitals clustered at or near the middle of the score distribution (mean actual score, –0.05; lowest actual score, –2.04; highest actual score, 1.78). Only a handful of high-scoring hospitals scored at or beyond half of the potential highest score above the midpoint of 0, and there were also only a handful of low-scoring hospitals that were at or beyond half of the potential lowest score.
There were no obvious break points or clusters in the distribution, which was smooth and continuous throughout the range.
The percentage of hospitals with top- and bottom-quartile scores at the quality group level, for hospitals in each of the 5 overall Star Rating groups, is shown in Table 1. As expected because of the design of the Star Rating system, hospitals with high overall Star Ratings were more likely to have group scores in the top quartile, but the strength of that association varied by quality group. The relationship was clear in 3 of the 4 highly weighted measure groups (Readmission, Safety of Care, and Patient Experience), but less strong in the other 3 measure groups and in the Mortality group. In the Mortality group, the largest difference was between 5-star and 4-star hospitals; there were much smaller differences among the 3 other Star Rating groups. In the Effectiveness of Care group, only 43% of 5-star hospitals had scores in the top quartile, whereas 23% of 1-star hospitals had scores in the top quartile. For Efficient Use of Medical Imaging, there was no relationship between overall star rating and scores in the top quartile for the measure group.
The results of a similar analysis at the individual measure level are shown in Figure 2. Figure 2 shows the percentage of 1-star hospitals with a top-quartile measure score (left side of figure), and the percentage of 5-star hospitals with a bottom-quartile measure score (right side of figure). There is clear variability from measure to measure, with some measures having essentially no “crossover” ratings (high Star Rating and scores in lowest quartile or vice versa), and others with more than 30% of hospitals with crossover ratings. On average across the 50 quality measures, there were 12% of 1-star hospitals with ratings for a given measure in the top quartile and 16% of 5-star hospitals with ratings for a given measure in the bottom quartile. (eAppendix D reports all data on all 5 hospital star groups.)
Table 2 and Table 3 present results of the relationships between overall Star Ratings and individual quality measures using California and New York quality data. Table 2 reports the results of the correlational analysis. Of the 31 measures available from California, there were significant associations (P < .05) in the unadjusted values for 6 measures, but only 1 of those associations (craniotomy mortality) remained significant in the adjusted analysis. Of the 14 measures available for New York, there were 5 significant associations in the unadjusted values; 3 were still significant in the adjusted analyses. All 3 of those measures were for care related to CRE.
The same relationships are shown in Table 3, but in this case the table entries are the mean values on specific measures for 3 sets of hospitals: those with 1- or 2-star CMS ratings, those with 3-star ratings, and those with 4- or 5-star ratings. As in the correlational analysis, there were essentially no significant relationships found in the analyses adjusted for multiple comparisons. The only relationship that was found in the adjusted analysis (craniotomy mortality in New York) showed an interesting pattern in which the lowest mortality was found in 3-star hospitals rather than in 4- or 5-star hospitals.
Results from the Poisson regressions controlling for case volume are presented in eAppendix E. Similar to the above state-level analysis, relationships between star ratings and individual quality measures were all nonsignificant, the only exception being the “CRE Hospital Onset Infection Rate—all body sites” measure (incidence rate ratio, 0.79; P = .017).
Given the range of possible total summary scores in the Star Ratings, the range of actual scores was narrow. The shape of the curve is to be expected as the Star Ratings methodology ensures a normal distribution of hospital ratings. However, the extent to which the curve is compressed within the potential range of scores vs spread across that whole range would not seem to be an intentional design feature, as a wider distribution across the full possible scale range would be more useful for consumer choice. CMS has been making changes to different components of the methods over the years, however, trying to “flatten” the distribution bell curve and assign more hospitals in the 1-star and 5-star categories.17
The large gaps between the actual highest/lowest scores and the potential highest/lowest scores seemed noteworthy. The actual best hospital total score was 1.78; a hospital with the best score on all 7 quality groups would have scored 3.01. On the other side of the distribution, the actual worst hospital score was –2.04; a hospital with the worst score on all 7 quality groups would have scored –3.50. The median 5-star hospital score was 0.73, which is much closer to the overall hospital mean of –0.05 than to either the best actual hospital score (1.78) or the “best possible” score for a hypothetical hospital with the best scores in all quality groups at 3.01. Similarly, the median 1-star hospital score was –0.95, which is closer to the overall mean than to either the score of the actual worst-scoring hospital (–2.04) or the “worst possible” hospital score of –3.50. Choosing a “high-performing” hospital or avoiding a “poor-performing” hospital just on the basis of the Star Ratings alone, then, is likely to yield a choice that is not much different from average in terms of total score.
Many hospitals at the top of the summary score distribution had low scores in some measure groups, and many hospitals at the bottom of the score distribution had high scores in some measure groups. A consumer choosing a 4-star hospital on the basis of that rating, for example, will not find consistent “4-star quality” in all measure groups or in all specific individual quality measures.
There were also no obvious break points in the distribution of overall scores. Although CMS uses a method to divide hospitals into 5 groups and assign Star Ratings to those groups (“k-means clustering”),18 hospitals do not naturally cluster into groups in terms of the overall scores.
There clearly are hospitals with relatively good performance on multiple quality groups, which thereby earn a 5-star rating. One characteristic of those hospitals is that they report on relatively fewer measures and do not have scores in all quality groups, including the important quality groups of Safety of Care and Mortality.19
We find, however, that many 1-star hospitals have top scores in important areas such as Safety of Care, Mortality, or Patient Experience. Likewise, some 5-star hospitals have low scores in 1 or more of the 7 quality groups. When we examine correlations between the overall Star Ratings and external, independent measures of quality of care for specific clinical conditions (the California and New York measures), we find no correlations, so that the Star Ratings cannot be used to predict quality on those specific clinical conditions either.
This pattern of findings would seem to raise concerns not only about the CMS Star Rating system, but about the value of any global hospital rating system that attempts to combine scores on a large number of disparate measures of performance into a single overall score or categorical rating.20 The lack of correlation among individual measures of hospital quality by domain (eg, readmission, infection) or clinical program (eg, cardiology, orthopedics) means that any global rating system will have little or no ability to predict quality of care outside the set of measures used to construct the global rating. If a global rating system does not include, for example, measures of quality for elective knee replacement surgery, there is no basis to presume that the global rating will be useful in choosing a hospital for elective knee replacement surgery.
Health plans, purchaser groups, patient advocacy groups, and others seeking to make broad judgments on overall performance or quality may still find it useful to use an overall 5-category rating system for the choices or recommendations they make. Any process that involves some yes/no decision about a hospital (eg, inclusion in a limited provider network) has to have some basis for the decision, and the Star Ratings may be as good as any other available. It has to be noted, however, that major global hospital rating programs including the CMS Star Ratings, U.S. News & World Report, and Healthgrades do not generally agree with each other on their hospital ratings20; they do not even agree on hospitals’ performance on specific quality domains for a particular condition.21
For individual consumers, the problem is perhaps more difficult, in that the overall Star Ratings do not seem to predict a patient’s likely experience in specific clinical domains, or with specific elective surgical procedures. This problem has been noted previously.12 Patients who are interested in finding the best hospital for heart surgery or for elective hip replacement will not be able to use the Star Ratings to do that.
Our study had limitations. First, state-level quality measure reporting periods were not perfectly aligned with those of the CMS Star Ratings. We used 2015 mortality data and 2018 SSI data from both California and New York (the most recent years available for risk-adjusted rates), and Star Ratings for the February 2019 reporting period. Given that hospitals’ performance on quality indicators usually does not change substantially over time, we did not expect that the different reporting periods would have affected our key findings. Second, in the state-level analysis, some member hospitals in multihospital health systems reported their quality data for Star Ratings in a consolidated way under the same CMS Certification Number. In this case, these member hospitals would have the same “consolidated” stars (ie, they were not rated separately in Star Ratings), even though they reported separately, to their states, the quality measure data we included in the study. Although we cannot fix the underlying reporting issue, it was not a common issue in our data set, and we did not expect it to affect our key findings in any substantial manner. Third, our study did not explore the possible impact of the minimum thresholds for a hospital to receive a Star Rating (at least 3 quality groups, one of which must be the outcome group—Mortality, Safety of Care, or Readmission—and at least 3 measures in each group).18 Our overall score distribution analysis excluded hospitals with missing scores in any of the 7 quality groups. We do not know how this exclusion could have affected the relationship between Star Ratings and individual quality measure or changed the summary score distribution. Lastly, hospital sample sizes in some parts of the analyses were small, especially in the state-level analysis, limiting our ability to detect significant relationships that we might have otherwise found.
No hospitals were either best in all quality groups or worst in all quality groups. Because the overall Star Ratings are not correlated with other quality measures for specific clinical conditions or procedures, the global Star Ratings may not provide useful guidance to consumers for choices of their specific medical conditions or needs.
Author Affiliations: Center for Health Policy & Health Services Research, Henry Ford Health System (JH, DRN), Detroit, MI.
Source of Funding: None.
Author Disclosures: Drs Hu and Nerenz are employed by Henry Ford Health System; the 5 acute care hospitals in the Henry Ford Health System are scored every year in the Star Rating System.
Authorship Information: Concept and design (JH, DRN); acquisition of data (JH); analysis and interpretation of data (JH, DRN); drafting of the manuscript (JH, DRN); critical revision of the manuscript for important intellectual content (JH, DRN); statistical analysis (JH); obtaining funding (DRN); and supervision (DRN).
Address Correspondence to: Jianhui Hu, PhD, Center for Health Policy & Health Services Research, Henry Ford Health System, 1 Ford Pl, Ste 3A, Detroit, MI 48202. Email: firstname.lastname@example.org.
1. Overall hospital quality star rating. CMS. Accessed April 5, 2021. https://data.cms.gov/provider-data/topics/hospitals/overall-hospital-quality-star-rating/
2. Jha AK. The stars of hospital care: useful or a distraction? JAMA. 2016;315(21):2265-2266. doi:10.1001/jama.2016.5638
3. Bilimoria KY, Barnard C. The new CMS hospital quality star ratings: the stars are not aligned. JAMA. 2016;316(17):1761-1762. doi:10.1001/jama.2016.13679
4. Chatterjee P, Joynt Maddox K. Patterns of performance and improvement in US Medicare’s Hospital Star Ratings, 2016-2017. BMJ Qual Saf. 2018;28(6):486-494. doi:10.1136/bmjqs-2018-008384
5. National Quality Forum releases multistakeholder recommendations for strengthening the overall hospital star rating system. News release. National Quality Forum. November 6, 2019. Accessed April 26, 2020.
6. Haeder SF, Weimer DL, Mukamel DB. California hospital networks are narrower in Marketplace than in commercial plans, but access and quality are similar. Health Aff (Millwood). 2015;34(5):741-748. doi:10.1377/hlthaff.2014.1406
7. Hu J, Jordan J, Rubinfeld I, Schreiber M, Waterman B, Nerenz D. Correlations among hospital quality measures: what “Hospital Compare” data tell us. Am J Med Qual. 2017;32(6):605-610. doi:10.1177/1062860616684012
8. Kaye D, Norton E, Ellimoottil C, et al. Understanding the relationship between the Centers for Medicare and Medicaid Services’ Hospital Compare star rating, surgical case volume, and short-term outcomes after major cancer surgery. Cancer. 2017;123(21):4259-4267. doi:10.1002/cncr.30866
9. Mehta R, Paredes A, Tsilimigras D, et al. CMS Hospital Compare system of star ratings and surgical outcomes among patients undergoing surgery for cancer: do the ratings matter? Ann Surg Oncol. 2020;27(9):3138-3146. doi:10.1245/s10434-019-08088-y
10. Papageorge MV, Resio BJ, Monsalve AF, et al. Navigating by stars: using CMS Star Ratings to choose hospitals for complex cancer surgery. JNCI Cancer Spectr. 2020;4(5):pkaa059. doi:10.1093/jncics/pkaa059
11. Koh CY, Inaba CS, Sujatha-Bhaskar S, Nguyen NT. Association of Centers for Medicare & Medicaid Services Overall Hospital Quality Star Rating with outcomes in advanced laparoscopic abdominal surgery. JAMA Surg. 2017;152(12):1113-1117. doi:10.1001/jamasurg.2017.2212
12. Khatana SA, Groeneveld P, Giri JS. Association between New York State hospital post-percutaneous coronary intervention mortality and readmissions and CMS Hospital Star Ratings. J Am Coll Cardiol. 2018;71(suppl 11):A95.
13. Statistical Analysis System (SAS) package. QualityNet. 2020. Accessed May 4, 2020. https://www.qualitynet.org/inpatient/public-reporting/overall-ratings/sas
14. California hospital inpatient mortality rates and quality ratings. California Health and Human Services Open Data Portal. Accessed May 4, 2020. https://data.chhs.ca.gov/dataset/california-hospital-inpatient-mortality-rates-and-quality-ratings
15. Surgical site infections (SSIs) for operative procedures in California hospitals. California Health and Human Services Open Data Portal. Accessed April 5, 2021. https://data.chhs.ca.gov/dataset/surgical-site-infections-ssis-for-28-operative-procedures-in-california-hospitals
16. Hospital-acquired infections: beginning 2008. Health Data NY. Updated November 14, 2019. Accessed May 4, 2020. https://health.data.ny.gov/Health/Hospital-Acquired-Infections-Beginning-2008/utrt-zdsi
17. Castellucci M. CMS unveils updated hospital star ratings formula. Modern Healthcare. December 21, 2017. Accessed August 11, 2020. https://www.modernhealthcare.com/article/20171221/NEWS/171229968/cms-unveils-updated-hospital-star-ratings-formula
18. Yale New Haven Health Services Corporation/Center for Outcomes Research & Evaluation (YNHHSC/CORE). Overall hospital quality star rating on Hospital Compare: public input request. CMS. February 2019. Accessed April 5, 2021. https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/MMS/Downloads/Overall-Hospital-Quality-Star-Rating-on-Hospital-Compare-Public-Input-Period.pdf
19. Chung JW, Dahlke AR, Barnard C, DeLancey JO, Merkow RP, Bilimoria KY. The Centers for Medicare and Medicaid Services hospital ratings: pitfalls of grading on a single curve. Health Aff (Millwood). 2019;38(9):1523-1529. doi:10.1377/hlthaff.2018.05345
20. Bilimoria KY, Birkmeyer JD, Burstin H, et al. Rating the raters: an evaluation of publicly reported hospital quality rating systems. NEJM Catalyst. August 14, 2019. Accessed April 5, 2021. https://catalyst.nejm.org/doi/full/10.1056/CAT.19.0629
21. Austin JM, Derk JM, Kachalia A, Pronovost PJ. Assessing the agreement of hospital performance on 3 national mortality ratings for 2 common inpatient conditions. JAMA Intern Med. 2020;180(6):904-905. doi:10.1001/jamainternmed.2020.0450