Joseph P. Newhouse, PhD; Mary Price, MA; John Hsu, MD, MBA; Bruce Landon, MD, MBA; and J. Michael McWilliams, MD, PhD
Value-based purchasing has emphasized moving away from pure fee-for-service reimbursement by shifting some financial risk from insurers to healthcare delivery systems and provider groups. One of the highest-profile efforts has been accountable care organizations (ACOs), which share financial risk with payers for a defined population of patients rather than being paid solely on a fee-for-service basis for an undefined population. Successful ACOs could contemplate assuming full financial risk—for example, by becoming Medicare Advantage (MA) plans or entering into capitation (percent of premium or delegated risk) contracts with existing MA or commercial insurance plans.
Evaluation of ACO performance to date has largely focused on individuals in Medicare ACOs, comparing their healthcare utilization with that of individuals in traditional Medicare (TM) who are not in ACOs.1-4
These evaluations have found modestly lower spending and unchanged or modestly higher quality at ACOs, with savings growing over time and with effects concentrated in physician- rather than hospital-based entities. Evaluation of a commercial ACO-like contract found a similar result.5
In this paper, we broaden the focus by comparing a Medicare ACO not only with a TM comparison group but also with an MA plan within the same delivery system. In addition, we compare utilization and cost in the same organization’s commercial ACO with a commercially insured comparison group. Because the organization shared risk in its MA plan over our entire period of observation, we expected that utilization and spending in the MA plan would initially be below that of the ACO group, in which accepting risk began during the observation period. After the establishment of the Medicare and commercial ACOs, we expected their utilization to decrease more rapidly than that of comparison groups.
Banner Health and Its Insurance Contracts
Our data came from 1 large delivery system, Banner Health, which is headquartered in Phoenix, Arizona (Maricopa County). Banner operates in several sites in the western United States, but we limited our sample to residents of Maricopa County, where the great majority of Banner users live. Not only was Banner one of the original participants in the Medicare Pioneer ACO program that began in 2012, but for several years before 2012, it partnered with Blue Cross Blue Shield of Arizona (BCBS Arizona) to offer an MA plan. Banner’s contractual incentives in the MA plan were complex, but risk was shared approximately equally between BCBS Arizona and Banner. In its Pioneer ACO, Banner chose a Core Option B contract, which meant it accepted 70% 2-sided risk in year 1 and 75% 2-sided risk in years 2 and 3, with both upside and downside risk capped at 10% of total spending.
Also starting in 2012, Banner partnered with Aetna to offer a commercial ACO product to larger self-insured employers (those with >50 employees) that had an existing preferred provider organization (PPO) contract. Similar to Medicare’s ACO attribution rules, employees of the participating firms and their dependents were prospectively attributed to a Banner primary care physician (PCP) if they used a Banner PCP for the plurality of evaluation and management (E&M) services in the prior year. Providers of those not attributed were reimbursed at negotiated fee-for-service rates. Similar to the Medicare program, Banner shared financial risk for the attributed participants for all medical services against a benchmark. Employee benefits were the same for all employees and dependents in the PPO contract. In addition to its Pioneer ACO and Medicare Advantage plan, Banner had other risk-based arrangements, such that about 30% of its revenue was risk-based.
Banner’s performance in the Pioneer ACO program depended on the method of assessment. CMS’ formal evaluation for years 1 and 2 used a difference-in-differences (DID) model with 2 TM control groups: 1 from the local (“near”) market and 1 from a nonlocal (“far”) market—with the latter group to account for potential spillovers in the local market. The question that the CMS evaluation sought to answer was whether the ACO’s spending growth was less than either comparison group’s. On this criterion, Banner did not save money in years 1 and 2.6
CMS’ method for rewarding Pioneer ACOs, however, differed from its evaluation method and was based on a benchmark, which was a function of the historical spending of attributed beneficiaries at the ACO trended forward at a national trend rate. Using this method of assessment, Banner performed well (eAppendix
[available at ajmc.com
The data for the ACO and TM comparison groups come from the 100% Medicare files for Maricopa County for 2010 to 2014. All parts A and B spending are included; drug spending was omitted, other than injected or infused drugs covered under Part B. MA data come from BCBS Arizona. The MA covered services analyzed here are the same as the TM services. The MA dollar figures use allowed charges, which are based on contracted unit prices that are confidential. These unit prices are not identical to TM prices, so some of the difference in spending between the MA group and the other 2 groups arises from unit price differences.
An alternative to using the contracted charges is to impute Medicare unit prices based on procedure and site-of-service codes. Although this would hold unit price constant in spending comparisons, it is a laborious and potentially error-prone procedure; we thus rejected it because BCBS Arizona asserted that its prices closely approximated TM prices, consistent with findings nationally and consistent with having a competitive MA product.7
Because of the close approximation between the contracted unit prices and TM prices, the proportion of spending differences between the MA group and the other 2 groups that is attributable to unit price differences should be small.
Although we compared various measures of utilization and total spending among the MA, ACO, and TM comparison groups, we could not obtain comparable spending values for specific types of services for the MA group because of differing aggregations of services in our data. For example, we could not determine MA emergency department (ED) spending because it was included with inpatient spending if the patient was admitted. Therefore, we instead made 3-way comparisons among MA, the ACO, and a TM comparison group for total medical spending and for various utilization measures but only a 2-way comparison of ACO and TM spending on specific medical services.
We faced 2 other issues in comparing the MA plan’s performance with that of the ACO and TM plans. About a quarter of hospital admissions in the MA plan were covered by a capitated contract, and for those admissions the paid claims files show a zero dollar amount. To obtain comparable spending figures, we imputed the mean payment for the relevant diagnosis-related group among the MA hospital claims with positive dollars. Second, all home health services in the MA plan were covered by a capitated contract and, consequently, show no individual-level spending. Therefore, we imputed spending for all home health claims using the estimated equation for risk-adjusted home health spending at the patient level in the ACO contract. Because home health services account for only 3% to 6% of total spending in the ACO, depending on the year, this approximation should induce little error.
The commercial data come from Aetna for Maricopa County residents for 2010 to 2014. All medical and physician services are included, but drugs were excluded because they are sometimes covered under a separate contract.
Our study was approved by the Harvard Medical School Institutional Review Board.
Although the Pioneer program’s actual attribution of beneficiaries to ACOs was prospective and based on use in the prior 3 years, we used retrospective assignment to assign beneficiaries to providers in each study year. We could not apply the Pioneer program’s prospective assignment method consistently because we lacked data for 3 years prior to the study period; however, as a result, we avoided the problem of regression-to-the-mean effects that prospective assignment potentially introduces when applied to an initial cohort that is fixed.5,8
To avoid assignment to a time-varying panel of physicians, we kept the list of ACO physicians constant over time using National Physician Identifiers (NPIs) to isolate within-provider effects of the program. We used NPIs rather than Tax Identification Numbers (TINs) to identify physicians because Pioneer ACOs were not required to include all providers with the same TIN in the ACO. To define the set of physicians in our main analyses, we used the physicians in the Banner ACO as of 2012, although we also tested the sensitivity of using those in the ACO in 2014 instead. In short, we evaluated the performance of the same group of physicians before and after the ACO contracts began. The TM comparison group comprised TM beneficiaries in Maricopa County who were not attributed to the Banner ACO.
To maintain comparability with the ACO-attributed group, we excluded those beneficiaries in both the TM group and in the MA plan with no use of qualifying E&M services in the calendar year, because that group could not be attributed. This zero-use group constituted 10.2% to 10.5% of the TM group depending on the year; we cannot know what proportion of this group would have been attributed to Banner if they had used E&M services. The MA group had 3.3% to 4.9% of nonusers, depending on the year.
Although we have unique identification numbers for individual MA providers, they are idiosyncratic, not NPIs or TINs. We therefore analyzed the MA data using a constant set of providers, namely those providing services to MA beneficiaries in 2012. We tested the sensitivity of the results to those providing services in 2014 and to those providing services in the calendar year being analyzed (a nonconstant set of providers).
To increase comparability and in the spirit of doubly robust regression, we balanced the ACO, TM, and MA groups using inverse probability weights based on cells defined by age group (65-74, 75-84, and ≥85 years) and gender. Matching only on time-invariant factors, such as age and gender, avoids bias that can arise from matching on time-varying variables, such as pre-ACO period outcome measures.9
For all comparisons, we show annual risk-adjusted utilization rates, as well as total annual risk-adjusted spending per person for each year from 2010 to 2014 for the ACO, TM, and MA groups. To risk adjust, we used CMS Hierarchical Condition Categories (HCCs) version 12 and diagnoses from 2009.
We used standard linear regression methods for each group separately with the individual’s HCC risk score on the right-hand side. The predicted rates that we show set the risk score to 1.0. In equation form, we estimated the following equation for each of the 3 groups:
yit = αt + βHCCit + ϵit ,
is an outcome measure (spending or utilization) for individual i
in year t
are constants to be estimated.
Because the trend in the 3-year post-ACO period is informative, we present our main results in the text using figures that show predicted annual utilization rates and spending from the equation above. The absolute values shown are centered at the mean risk score. In addition to calculating annual results, we carried out a standard DID analysis that compared the 2 years of the pre-ACO period (2010-2011) with the 3 years of the post-ACO period (2012-2014) for the ACO group relative to the TM or MA groups. Regression equations from the DID analysis are available in the eAppendix Tables
. Although the trend lines appear reasonably parallel in the pre-ACO
period, we cannot conduct a formal test with only 2 years of data.
Actual attribution was prospective and similar to Medicare, but we used retrospective attribution to analyze the data for the same reasons as with the Medicare sample. We risk adjusted commercial spending using HHS-HCCs, V0314.127.L1,10
and estimated equations for the ACO and comparison groups similar to the Medicare equation shown previously. The HHS-HCC model uses concurrent diagnoses with a separate model for each metal level in the exchange. We used the model for the Gold plan because its actuarial value is close to that of the actual plan and, as in the Medicare case, centered the predicted values at the mean risk score. We did not have firm identifiers, so we could not include firm fixed effects. Thus, there may be some modest bias to the degree that the penetration of Banner differs by firm.
We disaggregated total spending and use into inpatient, E&M, ED, and other outpatient spending. Like the Medicare analysis, we do not have data on drug spending other than drugs covered by the medical benefit. Because our data set included a flag from the plan for attribution, which was based on the past year’s use, we compared stability of attribution in using prospective and retrospective attribution.
After inverse probability weighting, the age–sex groups were well balanced (eAppendix Table 1
[A-D] shows risk-adjusted utilization rates of various medical services in the MA, ACO, and TM comparison group among those with positive use. Although the percentage of users in MA was greater than in the 2 TM groups, as noted previously, MA hospitalization rates were below those of the ACO and the TM comparison groups in all years (Figure 1 [A]). The differences between the hospitalization rate in the MA group versus the ACO and TM groups steadily narrowed over time, but the MA rate remained about 10% below the rates of the other 2 groups in 2014, the final year of observation.
In the 2-year pre-ACO period, the hospitalization rate in the ACO and TM groups had parallel trends, but after the establishment of the ACO, the rate in the ACO group fell at a more rapid rate (Figure 1 [A]). In 2010, the rate of skilled nursing facility (SNF) days in both the ACO and TM groups was about twice that of the MA plan rate, but the MA rate rose steadily, whereas rates in the other 2 groups fell (Figure 1 [B]). The ACO–TM comparison is difficult to interpret because pre-ACO period trends differ. Neither E&M office visit nor ED visit rates exhibited any notable trend (Figure 1 [C and D]).
Consistent with its lower use of acute and postacute services, the MA group had the lowest total risk-adjusted spending in all years (Figure 2
). Nevertheless, its spending rose consistently through the 5-year period, whereas spending in the TM and ACO groups did not vary nearly as much. By 2014, spending in the MA group had converged toward that of the other 2 groups; however, it remained 10% below that of the 2 groups, and the difference was larger in the first 2 years of the ACO.
Spending in the Medicare ACO cohort was slightly higher in the pre-ACO period than in the TM comparison group, and in 2012—the first year of the ACO—it ticked marginally up. It then fell to the same level as the comparison group in 2013 and 2014. eAppendix Figures 1 through 4
show corresponding spending data on specific services for the ACO and TM comparison groups.
DID results that compare averages for the 2 pre-ACO years with the 3 post-ACO years are shown in eAppendix Tables 2
. These results add no new insights to the results just described. Unadjusted rates are shown in eAppendix Figures 5 through 9
The risk-adjusted data show that total cost in the ACO group rose at the same rate as in the comparison group in the pre-ACO period. However, in 2012—the first year of the ACO—costs rose in the commercial ACO relative to the comparison group but thereafter fell at a faster rate than in the comparison group, such that by 2014—the third year of the ACO—costs were approximately equal (Figure 3
[A]). This result is mainly driven by the experience with inpatient costs and, to a much lesser degree, by outpatient non-E&M costs (Figure 3 [B-E]). Differences in other types of costs are small. Unadjusted commercial rates of utilization and spending on these services are shown in eAppendix Figures 10 through 14
. DID results for the commercial group are shown in eAppendix Table 4
. Like the Medicare DID results, these shed no new light.
We also assessed the proportion of commercially insured individuals assigned to the ACO using retrospective attribution who would also have been assigned using prospective attribution. For Banner, these values were a little more than 40% in 2013 and 2014; for non-Banner physicians, the values were a little more than 60%. The non-Banner values are higher in part because individuals attributed to a given non-Banner physician in 2013 and a physician in another non-Banner group in 2014 both count as being attributed to a non-Banner physician, whereas an individual had to remain within Banner in both years to be attributed to Banner. Both these values are well below the 80% value for Medicare beneficiaries because of the churn among employers in commercial insurance that does not occur among the Medicare population.8
Within the Medicare program, we expected the MA group to exhibit the lowest spending over the years we observed because Banner faced financial risk throughout the period, whereas the ACO did not begin until 2012. We also expected the ACO group to exhibit slower growth in use and cost than the TM comparison group after the ACO was established. In fact, the MA group did have the lowest spending of the 3 groups, driven by the lowest use of hospital and postacute services.
Also as expected, hospitalization rates in the Pioneer ACO group declined more rapidly than in the TM comparison group (with similar pre-ACO period trends for the 2 groups). Comparison of SNF rates was difficult because pre-ACO period trends differed. Counter to expectation, ACO spending rose relative to the comparison group in the first year of the ACO but then fell faster in the next 2 years. The subsequent entry of Medicare Shared Savings Program plans in the local market may have biased our comparison against the ACO.
Zero-users could not be attributed in the ACO and TM comparison groups, which complicates comparison with the MA results because there were 5 to 7 percentage points fewer zero-users in the MA plan (10% zero-users in the TM groups vs 3%-5% in the MA group, depending on year). If we were to arbitrarily distribute the TM zero-use group between the ACO and the TM comparison groups in the same ratio as the positive-user group to derive per-person rather than per–positive-user rates, the differences between the MA group and the other 2 groups in utilization and spending would be about 5 to 7 percentage points smaller than shown previously. Nonetheless, MA spending rates would remain below those of the other 2 groups, especially in the pre-ACO period.
MA plans as a group are known to code diagnoses more intensively than coding in TM claims,11,12
raising the possibility that the MA plan was observed to spend less because, conditional on age, sex, and diagnosis, the average individual was coded as healthier in MA. In unadjusted data, however, the pre-ACO period difference between MA spending and that of the other 2 groups was even larger than in the adjusted data (eAppendix Figure 5), so the large pre-ACO
period difference in the risk-adjusted data is not an artifact of risk adjustment or of more intensive coding in MA. As a sensitivity test, we examined whether the MA results were sensitive to using the list of 2014 MA rather than 2012 MA providers and also providers in each calendar year, and the results were not sensitive.
Overall comparisons between the commercial ACO and TM comparison group showed little effect of the ACO. This may well be due to the greater degree of churn among the commercial ACO patients than the Medicare patients. Whether this degree of churn is found in other commercial ACO contracts is unknown.
This study is limited to outcomes at a single hospital-based delivery system, and one must therefore be cautious about generalizing its findings to other settings, especially to non–hospital-based ACOs. Nonetheless, its finding of better performance at the Pioneer ACO than at the TM comparison group, with respect to hospitalization, is consistent with the literature cited in the introduction.2,5
Relative to the literature, what is novel in these results is that adjusted hospitalization, SNF, and spending rates in Banner’s MA plan were notably below those of its Medicare ACO, although there was partial convergence over the 3-year period of observation. Although the commercial results were more ambiguous, possibly because of greater churn, our results overall support CMS’ efforts to transition Medicare reimbursement away from traditional fee-for-service.
The authors are grateful to The Commonwealth Fund for supporting this research under grant #20150905 and to David Blumenthal, MD, MPP, and Melinda Abrams of the Fund for comments on a preliminary draft. They thank Chuck Lehn of Banner Health, Robert Groves, MD, of Banner Aetna, Dave Firdaus of Blue Cross Blue Shield of Arizona, and Brigitte Nettesheim of Aetna for assistance.