Dori A. Cross, BSPH; Genna R. Cohen, PhD; Christy Harris Lemak, PhD; and Julia Adler-Milstein, PhD
Comprehensive primary care has long been recognized as the cornerstone of a high-performance health system.1,2 In response to rising healthcare costs and inconsistent quality
performance, strengthening primary care is a critical part of the US health policy agenda. A specific target is to improve care for patients with the greatest healthcare needs: those with complex conditions, multiple chronic illnesses, and mental health disorders. Such high-need patients use a disproportionate share of health services and the nature of their care needs provides opportunities for increased efficiency, quality improvement, and associated cost savings.3
To promote new approaches to primary care that improve outcomes for high-need patients, an array of quality improvement initiatives have proliferated in recent years.4-6 Growing evidence indicates that these efforts can reduce medical expenditures and increase quality of care.7-10 However, the evidence is still emerging about what is required for these efforts to actually result in improved performance.6,11-16 The answer likely involves myriad factors, as substantial, multifaceted organizational changes are required to improve care for high-need patients.17,18 These changes—such as aligning intrinsic motivation with external performance incentives,19-21 creating an organizational culture of deliberate learning,22 and acquiring and deploying specific organizational resources required for targeted improvements—likely take time to become accepted and embedded. Thus, whether practices sustain their commitment to improved performance for high-need patients may be a critical piece to understanding variation in performance improvement under pay-for-value initiatives.
This paper builds on existing research and attempts to fill key knowledge gaps about the impact of primary care practices’ continued participation in a pay-for-value program. Prior work has had limited access to robust longitudinal data and/or significant sample sizes to assess practice performance over time,23 and the majority focus specifically on participation in patient-centered medical home (PCMH) demonstrations, rather than broader pay-for-value programs. Among the studies that do examine the effects of sustained program participation, findings are inconsistent. Friedberg et al examined a broad range of outcome metrics over a 3-year period in the context of a PCMH demonstration and found minimal change in quality with no significant effects on cost or utilization.11 Lemak et al analyzed a broader pay-for-value program, also over a 3-year period, and found positive effects on quality and on a subset of cost categories.24 However, neither paper assessed the impact on outcomes for complex, high-need patients. High-need patients represent an understudied group that is particularly critical to study, given that they are likely to disproportionately benefit from improved care delivery, but may not benefit equally under performance improvement programs.25,26
To help better understand the impact of sustained participation in care delivery transformation efforts for high-need patients, we sought to answer the following specific research question: Is continuous participation in a fee-for-value physician incentive program associated with improved primary care practice cost and quality outcomes for high-need patients? The passage of the Medicare Access and Children’s Health Insurance Program Reauthorization Act (MACRA) of 2015—which aims to increasingly tie provider compensation to value of services delivered—creates particular urgency to better understand the specific context(s) under which existing pay-for-value programs positively impact patient care. We answered our research question in the context of a statewide, multi-pronged performance improvement program, which has been studied previously.24,27
We examined a range of cost, use, and quality outcomes for a panel of 1582 primary care practices that did and did not continuously participate in this pay-for-value program in order to assess various dimensions of performance. Our results inform ongoing efforts to use incentive programs to promote the evolution of primary care practices in ways that better meet the needs of high-need patients, and thereby improve overall health system performance.
Setting and Data
In 2005, Blue Cross Blue Shield of Michigan (BCBSM) created the Physician Group Incentive Program (PGIP), a pay-for-performance program developed in collaboration with Michigan physicians and physician organizations (eAppendix Figure
available at www.ajmc.com
]). Multiple programs fall within the PGIP umbrella, the largest of which is the PCMH program. The other programs—care management resources and billing codes, as well as quality-based reimbursement—provide additional resources and incentives to improve care while reinforcing practices’ PCMH transformation. Of all practices participating in PGIP, the majority (75%) are designated as PCMHs. BCBSM issues yearly designations to practices with significant progress and strong performance on PCMH capability measures. Since 2009, the number of physicians in PCMH-designated practices (4000 physicians in nearly 1500 practices) has tripled; BCBSM also supports non-PGIP practices interested in adopting PCMH capabilities.
We focused on the most recent 4 years for which program data were available (2010-2013) to balance our need to capture a sufficiently long period that reflected sustained participation—a period in which there was a large number of practices that met the sustained participation cutoff, and a relatively recent period in which current key national health policy efforts (ie, the Health Information Technology for Economic and Clinical Health Act and the Affordable Care Act) were underway.
Our target patient population were BCBSM members who: a) had 2 or more chronic medical conditions, including conditions included in the Charlson Comorbidity Index and 6 additional mental/behavioral health conditions shown to be significant drivers of cost and complexity (Table 1)
; and b) were continuously assigned to the same primary care provider (PCP) in the same practice location for the duration of the study period. Annual patient-level data were made available by BCBSM for analysis. Patient data included annual claims-derived outcome measures of interest (described in the following section); patient demographics (age, gender, and primary health conditions); and patient’s assigned PCP. BCBSM provided supplementary data that included PCP demographics, practice identifiers (that allowed us to group PCPs and their associated patients within practices), and the duration of practices’ participation in PGIP. The final analytic data set contained 69,772 patient-year observations (4 years for 17,443 unique patients) nested within 1582 practices in the state of Michigan.
Practice performance was evaluated using cost, use, and quality measures. We examined total allowed medical–surgical cost per member per year in addition to the 3 subcomponents of medical–surgical spending: inpatient, outpatient, and emergency department (ED) costs. We also examined total allowed drug cost per member per year. Six measures of use were included: numbers of inpatient admissions, ED visits, 30- and 90-day readmissions, PCP visits, and specialist visits.
We measured quality using an overall composite score composed of 21 individual measures that captured adherence to evidence-based practices. A list of these measures is included in eAppendix Table 1
. We also examined a 6-measure medication management subcomposite to specifically examine the effect of program participation on appropriate use of medications for patient care. The individual measures used to construct these composites were selected from the Healthcare Effectiveness Data and Information Set, as well as from internal BCBSM-defined metrics, described in detail elsewhere.24,28,29
Consistent with past work, we used composite measures rather than individual measures because of concerns about sufficient numbers of patients for any individual measure and heterogeneity in performance across individual measures.30,31
We used BCBSM PGIP program data to identify practices that continuously participated in PGIP during our study period (n = 1401 practices) and those that did not (n = 181 practices).
Practice- and Patient-Level Characteristics
We created a set of practice- and patient-level demographic measures to control for other factors likely to influence both PGIP participation and patient outcomes across different types of practices. Practice-level characteristics included average PCP age, average panel size (of BCBSM patients) among PCPs in the practice, and 2 measures of organizational size: number of PCPs in the practice and the proportion of high-need patients in the practice’s panel (based on BCBSM-assigned patients).32-34
At the patient level, we included age and gender.
We used a generalized linear mixed model to assess the relationship between continuous PGIP participation and practice performance. Our dependent variables were cost, use, and quality outcomes. Our independent variables were whether or not the practice was a continuous PGIP participant, year (1-4), and practice- and patient-level controls. In our models, we interacted time with PGIP participation to assess whether the trajectory of each outcome differed for patients treated in continuously participating versus non-continuously participating practices. Because we were concerned about the potential for regression to the mean, we ran a second set of models that assessed performance only in our first year of data (2010). This allowed us to assess whether any observed trends over the 2010 to 2013 period—favoring continuously participating PGIP practices—were likely explained by a higher or lower starting level in 2010. All models included patient-level random effects to account for variation associated with unmeasured patient factors. We also ran robustness tests using robust standard errors to at least partially address intra-practice correlation.
The distribution of our outcome measures fell into 1 of 2 categories. Most of our outcomes were 0-inflated; for these outcomes, our models simultaneously estimated both a binary outcome (odds of a patient incurring any cost or use), as well as continuous outcome (eg, estimated cost or number of encounters, conditional on a patient incurring at least some use in that category of service). The remaining outcomes were quality measures (for which all patients received a score) or use categories that are much more frequently used (ie, nearly 100% of patients in the sample would be expected to [and did] incur use greater than 0): total medical–surgical costs, outpatient costs, and PCP visits. For these measures, we only considered the continuous outcome. For continuous outcomes, costs were modeled using a log-normal distribution, use indicators were modeled using a Poisson distribution, and quality indicators with a normal distribution. In robustness tests, we ran models with alternate distribution assumptions (gamma for cost measures, negative binomial for use).
Patient and Practice Population
Our analytic sample included 17,443 unique patients, each with 2 or more medical conditions (Table 1
). Average age (51.8 years) and gender (47.9% male) in our sample did not differ significantly from the remaining population of patients with 0 or 1 condition. However, our sample population had (by definition) significantly higher incidence of disease and greater healthcare utilization across all metrics. The most common medical conditions were type 2 diabetes (52.9% of the focal population) and chronic obstructive pulmonary disease (27.2%), followed by liver disease (22.3%), asthma (20.2%), and cancer (17.6%).
The patients included in our sample were seen in 1582 unique practice locations (Table 2
), 1401 of which had 4 continuous years of continuous PGIP participation. Our control group (n = 181) was made up of 114 practices with no PGIP participation, and 67 with partial participation (average duration of 6 months over the 4 years we examined). More than half of the practices were solo physician offices (56.1%). Practices had an average attributed panel size of 825 BCBSM patients, with high-need patients comprising, on average, 4.1% of that panel. The average PCP age across practices was 51 years.
In 2010, total medical–surgical cost did not differ for patients in PGIP and control practices (P = .123) (Table 3
). Over time (2010-2013), patients in PGIP practices had similar trajectories of medical–surgical cost (+0.6% for PGIP relative to control; P = .668) (Table 3).
Although patients in PGIP practices, relative to control, incurred lower average inpatient, outpatient, and ED costs in 2010, only the difference in outpatient costs was statistically significantly: PGIP patients incurred 10.6% lower outpatient costs compared with control patients (P = .002). However, over the 4-year study period, patients in PGIP and control practices did not differ in their odds of incurring any inpatient, outpatient, or ED costs, nor in the amount of spending conditional on having any spending.
Patients in PGIP practices experienced lower odds of incurring any drug costs (odds ratio [OR], 0.80; P = .003 [Table 3]) than patients in control practices in 2010, whereas average total drug costs for patients with drug spending greater than $0 did not differ between PGIP and control. Over time, PGIP patients further reduced their odds of any drug spending (OR, 0.82; P <.001 [Table 3]), but, conditional on incurring any drug costs, total drug costs increased at a steeper rate for PGIP patients relative to control (+3.9%; P <.001).
In 2010, we observed no difference between PGIP and control patients on the odds of incurring any utilization, or amount of use (conditional on having any), for inpatient admissions or ED visits. Over time, PGIP and control patients had similar odds of any hospitalization (OR, 0.93; P = .108), but, among patients who incurred at least 1 hospitalization, PGIP patients experienced a steeper increase in number of hospitalizations relative to control patients (+5.7%; P = .047 [Table 3]). For ED visits, PGIP patients had lower odds of incurring any ED visit over time compared with control patients (OR, 0.88; P = .0002 [Table 3]), but did not differ in the number of ED visits (+3.2%, P = .132).
In 2010, PGIP patients had lower 30-day readmissions (–38.2%; P = .002 [Table 3]) and 90-day readmissions (–25.7%; P = .018 [Table 3]) compared with control patients. Over the 4-year study period, PGIP patients continued to significantly outperform control patients, both in terms of odds of incurring any readmission over time (OR, 0.65 for 30-day and 0.63 for 90-day; P <.001 for both [Table 3]), as well as the number of readmissions, conditional on having any (30-day: –19.9%, P =.008; 90-day:–27.5%, P <.001 [Table 3]) (Figure
Finally, in 2010, PGIP patients had fewer PCP visits (–4.8%; P <.001) and more specialty visits (+12.7%; P <.001). Over time, for both PCP and specialty visits, patients in PGIP and control practices did not differ in either their odds of incurring any visits, or the number of visits, conditional on having any (Table 3).
In 2010, there was no difference in overall quality or medication management quality between the 2 patient groups. Over time, PGIP patients realized significantly greater improvement relative to control patients for both overall quality (+1.6%; P ≤.009), as well as medication management quality (+3.0%; P <.001).
In models with alternate distributional assumptions (gamma distribution for cost measures, negative binomial distribution for utilization measures) and robust standard errors, our primary results largely persisted. At baseline, these models provided consistent or stronger evidence that PGIP practices outperform non-PGIP practices (eAppendix Table 2
). Trend results were also consistent with our original results (eAppendix Table 3
); however, in trend models with robust standard errors and our original distributional assumptions, our results related to drug costs, ED visits, and overall quality were no longer statistically significant at traditional thresholds. Given the fact that some coefficients changed as well, we suspect that these differences reflect instability in this particular specification of the model, and they were incorporated into our analysis with caution.
Our longitudinal analysis of more than 1500 primary care practices in Michigan over a 4-year period suggests that sustained participation in a pay-for-value program results in modest but meaningful improvements in care for high-need patients. Performance for practices participating in the PGIP pay-for-value program improved relative to nonparticipants in 3 domains. First, PGIP practices consistently and significantly outperformed control practices on 30- and 90- day readmissions. In 2013, compared with 2010, sustained PGIP participation resulted in a reduction of 25 readmissions per 1000 patients. Second, we found suggestive evidence that PGIP practices were able to reduce odds of incurring any ED utilization over time to a greater extent than control practices. Finally, we also found suggestive evidence that patients in PGIP practices saw significantly greater improvement over time in the quality of overall quality, as well as medication management quality (which could explain the increase in drug costs over time). However, total medical–surgical cost was not reduced, likely because avoided use was for relatively rare events and was partially compensated for by increased drug spending. In addition, overall quality did not improve over time. Taken together, our results suggest that sustained participation may be an important factor in improving specific dimensions of care for high-need patients under a pay-for-value program.
In order to see the benefits of participation in a pay-for-value program for high-need patients, practices appear to need to engage with the program in a sustained way. The changes in primary care practices that are required to improve care for high-need patients— including significant changes in organizational culture, an emphasis on teamwork, and staff-level buy-in to new care processes— likely require pursuit over multiple years.19,35
Practices also need time to understand program expectations and develop and reinforce new behaviors and processes that support redesigned care. The rapid growth in the PCMH component of PGIP over the study period is likely a key contributor to observed changes in our outcome measures; however, we believe the additional PGIP programs beyond PCMH play a critical role in providing additional resources and incentives to support and sustain practice changes that lead to higher-quality care.
We observed heterogeneous effects of sustained PGIP participation across our outcomes that are mostly consistent with these expectations. Specifically, sustained participation was associated with reductions in readmissions, better control over any ED use, and improved quality. Changes in these measures likely result from changes that take time to implement but lie within the control of primary care practices. For example, high-need patients are likely to have a high volume of healthcare encounters with many different providers, both specialists and hospital-based clinicians. Providers need time to develop and implement new systems and workflows for managing patient transitions and the volume of information flowing in and out of their practice, such as regular medication reconciliation checks and active follow-up after hospital discharge. In contrast, we found no program effect on inpatient utilization or total medical–surgical cost, which may reflect the fact that these 2 measures are less sensitive to changes that can be made by primary care practices. Significantly improving these outcomes, even among high-need patients who offer the greatest opportunity for gains, likely requires broader changes to the health system and to patient behavior—both of which are complex and require a long time frame to address.
Our study has several limitations to be considered when interpreting the results. First, because providers are not randomly assigned to PGIP (ie, providers self-select to participate), there may be unobserved differences between practices that sustained participation and those that did not, which might influence our patient outcome measures. For example, practices that already had support and resources from an umbrella provider organization prior to the start of our study period may have been more likely to sustain participation and also have better performance. We therefore focused on an associational analysis; however, we were able to use panel data with patient-level random effects to control for time-invariant patient characteristics. We compared performance in the baseline year, as well as trends over time between PGIP and control practices, to distinguish between regression to the mean and true improvements due to sustained program participation. Finally, the availability of control practices in the sample helped isolate sustained PGIP participation effects from secular time and maturation effects.
A second potential limitation of our study is that our data sample, by design, only includes patients who were continuously enrolled with BCBSM throughout the study period, and assigned to PCPs who practiced in the same location for all years included in this analysis. Only one-third of commercially insured BCBSM enrollees met these criteria, which limits the generalizability of our findings. Because patients with multiple conditions have ongoing, and often complex, healthcare needs that benefit from provider continuity, we expect this population is more likely than healthy individuals to maintain stable coverage and physician care; however, our findings may not hold for patients who regularly switch PCPs or experience lapses in insurance coverage.
Finally, our results come only from the state of Michigan, and are specific to 1 commercial insurer’s pay-for-performance program. However, PGIP is a large and inclusive program; the program was established in 2005 and has nearly 20,000 physicians participating. The program also operates within the context of fee-for-service reimbursement, and program requirements and reimbursement structures are similar to those of other regional and national pay-for-performance programs.36,37
Thus, we believe our findings are generalizable beyond the PGIP program.
As MACRA takes effect, provider payments will become increasingly tied to value through the Merit-Based Incentive Payment System and participation in alternative payment mechanisms (APMs), such as accountable care organizations, shared savings, or bundled payment initiatives. In early demonstrations, as well as currently operational new payment arrangements, these programs experience a lot of provider turnover.38
Although MACRA will compensate providers on an annual basis for APM participation, our findings about the benefits of sustained participation in these programs suggest that policy makers may want to consider conditional payments or additional incentives for providers who continuously participate in an initiative. In addition, the heterogeneous results across different outcome measures suggest that resources and support may be leveraged most effectively when targeted toward specific types of use that are more within practices’ direct control.
Given the large investment in pay-for-value programs to date, and their growing prominence, our findings offer reassurance that these initiatives appear to be effective in accelerating performance improvement among primary care practices caring for high-need patients. Our findings specifically point to the importance of sustained participation, which likely helps practices establish new care processes to improve outcomes under their control—in particular, ED use and readmissions, which are more prevalent among high-need patients. However, moving the needle on outcomes like total spending likely requires broader solutions that involve new approaches to health system organization and patient behavior change.