This study finds no evidence of a deleterious impact of pay-for-performance on minority patients in the Premier Hospital Quality Incentive Demonstration.
To determine whether racial disparities in process quality and outcomes of care change under hospital pay-for-performance.
Retrospective cohort study comparing the change in racial disparities in process quality and outcomes of care between 2004 and 2008 in hospitals participating in the Premier Hospital Quality Incentive Demonstration versus control hospitals.
Using patient-level Hospital Quality Alliance (HQA) data, we identified 226,096 patients in Premier hospitals, which were subject to pay-for-performance (P4P) contracts and 1,607,575 patients at control hospitals who had process of care measured during hospitalization for acute myocardial infarction (AMI), congestive heart failure (CHF), or pneumonia. We additionally identified 123,241 Medicare patients in Premier hospitals and 995,107 in controls who were hospitalized for AMI, CHF, pneumonia, or coronary artery bypass graft (CABG) surgery. We then compared HQA process quality indicators for AMI, CHF, and pneumonia between P4P and control hospitals, as well as risk-adjusted mortality rates for AMI, CHF, pneumonia, and CABG.
Black patients initially had lower performance on process quality indicators in both Premier and non-Premier hospitals. The racial gap decreased over time in both groups; the reduction in the gap in Premier hospitals was greater than the gap reduction in non-Premier hospitals for AMI patients. During the study period, mortality generally decreased for blacks relative to whites for AMI, CHF, and pneumonia in both Premier and non-Premier hospitals, with the relative reduction for blacks greatest in Premier hospitals for CHF.
Our results show no evidence of a deleterious impact of P4P in the Premier HQID on racial disparities in process quality or outcomes.
Am J Manag Care. 2014;20(10):e479-e486
Our results, from the largest pay-for-performance (P4P) demonstration performed to date, suggest disparities in process and outcomes for hospitalized minority patients are not aggravated by P4P, and may in fact be improved
Provision of financial incentives for high-quality care, commonly known as pay-for-performance (P4P), has become a common strategy to improve quality of care. By 2006 more than 80% of privately insured persons were covered by health plans using P4P.1 In October 2012, CMS adopted P4P for the Medicare program nationwide in most hospitals, except for Critical Access Hospitals.
Prior to adopting P4P, CMS conducted the largest demonstration program of the strategy to date among hospitals. Between the fourth quarter of 2003 and the first quarter of 2009, the Premier Hospital Quality Incentive Demonstration (HQID) rewarded high quality of care delivered by participating hospitals for 6 conditions: acute myocardial infarction (AMI), congestive heart failure (CHF), pneumonia, coronary artery bypass graft surgery (CABG), total hip replacement, and total knee replacement. Premier is a national organization of not-for-profit hospitals, which partnered with the federal government in the Premier HQID. While studies of the Premier HQID2-5 have shown it to have only modest benefits in improving quality of care, it is nonetheless the model for the new federal P4P program.
Programs using financial incentives to improve quality of care have enormous face validity. However, there are persistent concerns that rather than reduce disparities, this approach may exacerbate racial and ethnic disparities because of between-hospital differences in where minorities and nonminorities receive their care and/or within-hospital differences in how care is administered at a given hospital.6,7 Critics worry that racial minorities may receive care in institutions that are undercapitalized and less able to promote high-quality care. At the same time, minorities may have more disability or lower health literacy that result in poorer health outcomes and greater challenges in delivering high-quality care. Faced with financial incentives, hospitals might engage in quality improvement efforts that could widen disparities if the cultural, linguistic, and educational needs of minority patients prove difficult to address.
Despite the risk of unintended consequences, there is surprisingly little information on quality of care for racial minorities under P4P programs. To our knowledge there have been no studies to date in the United States documenting the impact of P4P on disparities in quality of care for minority patients. Therefore, in this study, we sought to answer 3 questions: First, how did black patients initially fare compared with white patients on receipt of evidence-based processes of care under Premier P4P? Second, how did these patients fare in terms of initial outcomes under the Premier P4P program? Finally, how did any racial disparities in the processes and outcomes of care change under P4P compared with patterns for patients treated in a group of hospitals that participated in public reporting but did not receive financial incentives? In carrying out these analyses, we were concerned with disparities combined from both between and within hospitals sources.
METHODSPremier HQID Hospital Participants and Controls
In 2003, CMS invited 421 hospitals that were part of the Premier Healthcare Informatics Program to participate in the HQID pay-for-performance program, and more than 260 hospitals agreed to do so. To participate, hospitals were required to provide data on 33 quality indicators for 3 medical conditions (AMI, CHF, and pneumonia) and 3 surgical procedures (CABG, total hip replacement, and total knee replacement). The 33 indicators included process measures for all 6 conditions and risk-adjusted mortality for AMI and CABG surgery. Hospitals performing in the top decile for any of the conditions would receive a bonus payment of 2% of Medicare payments for that condition. Hospitals scoring in the second decile received a 1% bonus. Starting in the third year of the demonstration, hospitals in the second-lowest decile were liable for a 1% financial penalty, while hospitals in the lowest decile received a 2% financial penalty. However, penalties were not ultimately initiated until the fourth year. After program initiation, the Premier HQID made changes to its incentive structures to reward improvement as well as performance.
We identified the national sample of non-Premier hospitals participating in public reporting through the Hospital Quality Alliance (HQA) as a control group, and adjusted for differences in hospital characteristics and the patient population.
Process Quality Indicators
In examining process indicators, we were able to obtain, from CMS, patient-based data on patients from hospitals participating in the HQA program. To ensure confidentiality, these data included a hospital-encrypted identifier and information on the hospitals’ bed size, teaching status, and participation in Premier, but no other hospital-based information.
eAppendix Table 1
The patient-level, all-payer data on HQA process indicators that we obtained from CMS included information on all patients discharged with AMI, CHF, or pneumonia submitted by hospitals participating in the HQA program between the fourth quarter of 2003 and the fourth quarter of 2008. Because many hospitals did not start reporting until the first quarter of 2004, we excluded data from the fourth quarter of 2003. The de-identified data we received included performance information on 18 process indicators for the 3 conditions (see available at www.ajmc.com). In addition to performance on the relevant quality indicators, hospitals reported patient gender, age, race/ethnicity, and primary payer. For each medical condition, we identified the relevant process measures and then tabulated the number of patients in the denominator for that condition and the percentage of them who received the specified service. We compared performance at 250 Premier hospitals and 3507 control hospitals in the patient-level data set we received from CMS.
Outcome Quality Indicators
To examine outcome indicators, we identified all hospitals providing hospital-based HQA data between 2004 and 2008 to the Medicare Compare database publicly available on the Internet, and linked them to Medicare cost reports and the American Hospital Association (AHA) annual survey. Using data from Medicare cost reports and the AHA survey, we were able to characterize these hospitals according to bed size, regional location, profit status, teaching status, eligibility for large bonuses (based on the proportion of the hospital’s patients with Medicare coverage), margin, and location in a competitive market (as measured by the Herfindahl-Hirschman Index). The latter 3 characteristics have been shown to be associated with a greater response to P4P.3
To examine clinical outcomes, we used Part A Medicare data in 2004 and 2008 on all patients discharged with a principal diagnosis of AMI, CHF, or pneumonia, or a procedure code indicating that they had received a CABG procedure. We included 251 Premier hospitals and 3257 control hospitals with linked data. We employed separate logistic regression models examining 30-day mortality for each of the conditions, accounting for clustering of patients within hospital in each model.
The study was determined to be exempt from human subjects review by the Harvard Office of Human Research Administration.
We first compared characteristics of Premier and non-Premier hospitals and the characteristics of white and black patients who received care at those hospitals. We initially examined process indicators separately for AMI, CHF, and pneumonia, displaying the data by quarter. Then we limited the database to process indicators collected during 2004 and 2008 and examined whether disparities changed between 2004 and 2008, and whether these changes were of similar magnitude in Premier and non-Premier hospitals. The process measure analysis used a repeated measures linear regression, with the patient as the unit of analysis so that each individual patient’s health quality score was used as the dependent variable in the model. Each patient had a score between 0 and 100, indicating the proportion of quality indicators that were relevant and whose standards were met.
By analyzing at the level of the patient, it was necessary for the model to assess and adjust for correlation between patients seen within the same hospital. A marginal generalized estimated equation model, as implemented through the GENMOD procedure in the SAS language, was used to control for this clustering of patients within hospitals.8 An initial independent correlation structure was used to provide an estimate of the overall effect of race (both within and between hospitals) on disparities, although empirical standard errors were used to derive correlation-adjusted test statistics and confidence intervals. The primary predictors were: time (2008 vs 2004); race; Premier status (Premier vs non-Premier); 2-way interactions between race and 2008, race and Premier, and 2008 and Premier; and a 3-way interaction between race, 2008, and Premier.
The 2-way interactions allowed us to compare racial disparities between Premier and non-Premier hospitals, as well as to compare racial disparities between the baseline and terminal time periods. The 3-way interaction allowed us to compare the change over time in racial disparities in Premier hospitals to the change over time in non-Premier hospitals. The main effects of race, 2008, and Premier, and the 2- and 3-way interactions were included in all models, regardless of significance. In this way, our models compared actual observed differences in HQA scores rather than assuming that certain differences were zero simply because their P values were greater than .05 (which could have happened because of limited power). Hospital characteristics such as size and teaching status were intentionally excluded from the model in order to preserve the effect of between-hospital differences. Results are displayed as adjusted means with P values for black versus white differences over time and Premier versus non-Premier differences determined from the appropriate interaction term. To assess patient mortality outcomes and determine whether disparities changed over time, we similarly used a marginal repeated measures logistic regression for Premier and non-Premier hospitals. Again, each patient was analyzed individually so that their binary mortality status was the dependent variable in the regression model. By using a patient-based model we were able to include patient-based variables (age, sex, and 28 comorbidities), as well as time (2008 vs 2004); race (white vs black); Premier status; 2-way interactions between race and 2008, race and Premier, and 2008 and Premier; and a 3-way interaction between race, 2008, and Premier.
The 28 patient-based comorbidities we included were those included in the Elixhauser risk adjustment scheme, a validated, widely used9-11 approach developed by the Agency for Healthcare Research and Quality.12,13 A 29th comorbidity in the scheme — AIDS — was dropped because the prevalence was too low in our population. As above, the P values from these interaction terms are presented to determine the statistical significance of changes in black-white differences over time and whether those changes were different in Premier versus non-Premier hospitals. Adjusted estimates of mortality rates for each year, for each race, and for each group of hospitals were calculated from the model. Differences between rates were calculated by subtraction, and confidence levels were based on the model P values.
eAppendix Table 2A
We analyzed HQA process quality indicators for 425,551 AMI, 680,257 CHF, and 727,863 pneumonia patients (2008 sample) discharged from the 250 Premier hospitals and 3507 control hospitals (). Within both Premier and non-Premier hospitals, white patients were older and less likely to be female. Compared with non-Premier hospitals that contributed patient-level HQA data, Premier hospitals were more often medium and large hospitals and more often teaching hospitals ().
eAppendix Table 2B
Our analyses of risk-adjusted outcomes included 191,868 AMI, 480,894 CHF, 382,586 pneumonia, and 75,932 CABG patients (2008 sample) discharged from 251 Premier hospitals and 3257 non-Premier hospitals (). Within both Premier and non-Premier hospitals, white patients were less likely to have diabetes, hypertension, and chronic kidney disease, and more likely to have chronic pulmonary disease. Compared with non-Premier hospitals, Premier hospitals were more likely to be of medium or large size, teaching, private non-profit, and located in the South. They had a lower percentage of Medicare patients ().
Changes in disparities in HQA quality process indicator performance in Premier hospitals versus non-Premier hospitals. Within Premier hospitals, black patients initially had lower scores on HQA performance than white patients for all 3 conditions (). For example, for AMI, white patients, on average, received 91.0% of the care for which they were eligible compared with 88.0% for blacks. Whites also had better scores than blacks at non-Premier hospitals. For all 3 conditions, racial disparities in performance diminished between 2004 and 2008 at Premier hospitals; the reductions were statistically significant for AMI and CHF. For example, for AMI patients, the racial disparity in performance decreased from 3.0% to 0.5%, a reduction of 2.5% (95% CI, 1.2%-3.7%). At non-Premier hospitals the racial disparities were also reduced ( and Table 2), and the reductions were statistically significant for pneumonia. The reduction in racial disparities at Premier hospitals was greater than at non-Premier hospitals for all 3 conditions, but the differences were statistically significant only for AMI and CHF (Table 2, last col. is P values).
Changes in risk-adjusted mortality in Premier hospitals versus controls. Within Premier, black patients initially had lower rates of risk-adjusted mortality than white patients for AMI and CHF. For example, AMI black mortality was 14.1% in the initial period versus 14.2% for whites. The patterns at non-Premier hospitals were qualitatively similar (14.7% vs 14.8% for blacks and whites, respectively). At Premier hospitals, mortality for blacks decreased relative to whites over the 4 years for all 4 conditions, although the changes were statistically significant only for CHF and pneumonia. At non-Premier hospitals, mortality for blacks relative to whites also decreased for AMI, CHF, and pneumonia, with the findings statistically significant for AMI and CHF. In comparing changes in racial disparities over time between Premier and non-Premier hospitals, the only statistically significant difference was the decrease in CHF mortality for blacks relative to whites, which was greater in Premier hospitals ().
While many have been concerned that pay-for-performance would result in a lower quality of care for minority patients, our results show no evidence of that outcome in the Premier HQID. While black patients initially had lower performance on process quality indicators in both Premier and non-Premier hospitals, the gap closed under pay-for-performance, and the reduction in the gap in Premier hospitals was actually greater than in non-Premier hospitals for patients with AMI. During the 4-year study period, mortality generally decreased for blacks relative to whites for AMI, CHF, and pneumonia in both Premier and non-Premier hospitals, but there was no evidence that improvement in disparities was greater in non-Premier hospitals. In fact, the relative improvement for blacks was actually greater in Premier hospitals for CHF. Taken together, these are reassuring findings for policy makers who plan to use incentive programs like those in Premier to drive improvement in quality of care.
Our findings have implications for the recently established federal P4P program known as value-based payment (VBP). VBP is similar to Premier. Both programs contain incentives for attainment and improvement. There is overlap in the incentivized medical conditions and surgical conditions, and in the process and outcome quality indicators used to gauge performance. The size of the financial incentives in both programs is also relatively modest (for VBP, 1% in 2014, rising to 2% in 2017). Thus our results suggest that engendering disparities in quality of care should likely not be an important problem for the new federal program.
In previous work14 we examined Premier hospital-based data and found that hospitals with high proportions of poor patients (as measured by a high Disproportionate Share Index) initially started out with worse performance on process measures, but caught up over time under pay-for-performance. Control hospitals without incentives, however, failed to do so. In complementary work, Werner et al15 examined hospital quality under public reporting and found that hospitals with high proportions of Medicaid patients performed more poorly at baseline and improved more slowly over time. Taken together, these studies suggest that pay-for-performance and public reporting may have differing impacts, although neither study focused on minority patients or used patient-level data. In fact, we know of no prior US studies that used patient-based data to directly measure the impact of P4P on the care and outcomes of minorities.
Our study has limitations. We studied a limited number of conditions in a single pay-for-performance program; however, the program we studied is the largest demonstration to date and has been adopted as the model for national extension. We focused on 3 clinical conditions and 1 procedure that have been well studied in the literature. The process measures indicating quality of care are thus on firmer ground than indicators for alternative conditions that lack a strong evidence base, and our findings are all the more salient since the processes are clinically important.
Hospitals participating in the HQID were self-selected and may have been more committed to quality improvement, although baseline performance and disparities were comparable in the 2 groups. We used data from claims to risk-adjust patient outcomes. While these data have well-known limitations, they have improved over time and are now commonly used to assess patient outcomes and to guide performance-based payment. Performance on the process measures was generally high; reduction of disparities could be related to a “ceiling effect,” particularly for AMI. We studied racial differences for only 2 groups, blacks and whites; inclusion of more may have been unwieldy. We believe the coding of race for these groups in administrative data is generally reliable, and for historical reasons the comparison is particularly salient. Finally, we examined quality of care for patients treated in-hospital. We were unable to ascertain whether financial incentives were associated with diminished access for minorities and disadvantaged patients.
In summary, creation of a national program of pay-for-performance will further the use of P4P as part of our armamentarium for improving the quality and efficiency of our care along with other approaches such as expanded public reporting, use of accountable care organizations, patient-centered medical homes, and bundling. Critics worry about the deleterious consequences of P4P for the care and health outcomes of minority and indigent populations. Our results from the largest demonstration done to date and from an important model for the recently established federal program suggest that the process of care for hospitalized minority patients is, if anything, improved under this formulation of pay-for-performance, and there is no evidence of a deleterious impact on mortality outcomes.
We are grateful to Jie Zheng, PhD, for programming assistance.Author Affiliations: Department of Health Policy and Management, Harvard School of Public Health, Boston, MA (AME, AKJ, EJO); Division of General Medicine, Brigham and Women’s Hospital, Boston, MA (AME, AKJ, EJO); VA Boston Healthcare System, Boston, MA (AKJ).
Funding Source: This study was funded by the Robert Wood Johnson Foundation, Princeton, NJ. The funding organization did not have any role in the collection, analysis, or interpretation of data, and did not have authority to approve or disapprove publication of the final manuscript.
Author Disclosures: Dr Epstein currently serves full time in the Office of the Assistant Secretary for Planning and Evaluation at HHS. This article does not represent the views of HHS. Drs Jha and Orav report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.
Authorship Information: Concept and design (AKJ, AME); acquisition of data (AKJ, AME); analysis and interpretation of data (AKJ, AME, EJO); drafting of the manuscript (AKJ, AME); critical revision of the manuscript for important intellectual content (EJO); statistical analysis (EJO); and obtaining funding (AME).
Address correspondence to: Arnold M. Epstein, Department of Health Policy and Management, Harvard School of Public Health, 677 Huntington Ave, Boston, MA 02115. Email: email@example.com.REFERENCES
1. Rosenthal MB, Landon BE, Normand SL, Frank RG, Epstein AM. Pay for performance in commercial HMOs. N Engl J Med. 2006;355(18):1895-1902.
2. Lindenauer PK, Remus D, Roman S, Rothberg MB, Benjamin EM, Ma A, Bratzler DW. Public reporting and pay for performance in hospital quality improvement. N Engl J Med. 2007;356(5):486-496.
3. Werner RM, Kolstad JT, Stuart EA, Polsky, D. The effect of pay-for-performance in hospitals: lessons for quality improvement. Health Aff (Millwood). 2011;30(4);690-698.
4. Jha AK, Joynt KE, Orav EJ, Epstein AM. The long-term effect of premier pay for performance on patient outcomes. N Engl J Med. 2012;366(17):1606-1615.
5. Glickman SW, Ou FS, DeLong ER, et al. Pay for performance, quality of care, and outcomes in acute myocardial infarction. JAMA. 2007;297(21):2373-2380.
6. Casalino LP, Elster A, Eisenberg A, Lewis E, Montgomery J, Ramos D. Will pay-for-performance and quality reporting affect health disparities? Health Aff (Millwood). 2007;26(3):W405-W414.
7. Chien AT, Chin MH, Davis AM, Casalino LP. Pay for performance, public reporting, and racial disparities in health care: how are programs being designed? Med Care Res Rev. 2007;64(5 suppl):283S-304S.
8. Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis (in Wiley Series in Probability and Statistics). Hoboken, NJ: Wiley, 2004:291-321.
9. Volpp KG, Rosen AK, Rosenbaum PR, et al. Mortality among patients in VA hospitals in the first 2 years following ACGME resident duty hours reform. JAMA. 2007;298(9):984-992.
10. Jha AK, Orav EJ, Li Z, Epstein AM. The inverse relationship between mortality rates and performance in the Hospital Quality Alliance measures. Health Aff (Millwood). 2007;26(4):1104-1110.
11. Weller WE, Rosati C, Hannan EL. Relationship between surgeon and hospital volume and readmission after bariatric operation. J Am Coll Surg. 2007;204(3):383-391.
12. Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care. 1998;36(1):8-27.
13. Southern DA, Quan H, Ghali WA. Comparison of the Elixhauser and Charlson/Deyo methods of comorbidity measurement in administrative data. Med Care. 2004;42(4):355-360.
14. Jha AK, Orav EJ, Epstein AM. The effect of financial incentives on hospitals that serve poor patients. Ann Intern Med. 2010;153(5):299-306.
15. Werner RM, Goldman LE, Dudley RA. Comparison of change in quality of care between safety-net and non-safety-net hospitals. JAMA. 2008;299(18):2180-2187.