Currently Viewing:

The American Journal of Managed Care June 2011

The American Journal of Managed Care June 2011

CLINICAL

Heiner K. Berthold, MD, PhD; Kurt P. Bestehorn, MD; Christina Jannowitz, MD; Wilhelm Krone, MD; and Ioanna Gouni-Berthold, MD

Chureen T. Carter, PharmD, MS; Henry Leher, PhD; Paula Smith, MS; Daniel B. Smith, MA; and Heidi C. Waters, MBA

MANAGERIAL

Rahul Shenolikar, PhD; Xue Song, PhD; Julie A. Anderson, MPH; Bong Chul Chu, PhD; and C. Ron Cantrell, PhD

METHODS

R. Scott Braithwaite, MD, MS; and Sherry M. Mentor, MPH

POLICY

Chi-Chen Chen, PhD; and Shou-Hsia Cheng, PhD

Regina Druz, MD, FACC

Natalia Gutierrez, MD; Nora E. Gimpel, MD; Florence J. Dallo, PhD, MPH; Barbara M. Foster, PhD; and Emeka J. Ohagi, PhD, MPH

WEB EXCLUSIVE

James C. Robinson, PhD

Currently Reading

The Structure of Risk Adjustment for Private Plans in Medicare

Joseph P. Newhouse, PhD; Jie Huang, PhD; Richard J. Brand, PhD; Vicki Fung, PhD; and John Hsu, MD, MBA, MSCE

Hsueh-Fen Chen, PhD; M. Christine Kalish, MBA, CMPE; and Jose A. Pagan, PhD

S. Nicole Hastings, MD, MHSc; Valerie A. Smith, MS; Morris Weinberger, PhD; Kenneth E. Schmader, MD; Maren K. Olsen, PhD; and Eugene Z. Oddone, MD, MHSc

# The Structure of Risk Adjustment for Private Plans in Medicare

Joseph P. Newhouse, PhD; Jie Huang, PhD; Richard J. Brand, PhD; Vicki Fung, PhD; and John Hsu, MD, MBA, MSCE

Health plan accounting data are used to test how well the CMSHCC risk adjustment system tracks relative costs of treating various diagnoses: not very well.

We included all Medicare beneficiaries in the MA-HMO who were enrolled throughout the year before the study year and throughout the study year (2006 or 2007), as well as those who died during a study year. We allowed up to a 2-month gap in membership in a calendar year because we conjecture that such gaps represent data processing errors rather than true changes in membership status. However, we believe that any medical claims from this gap are included in our utilization data. For the decedents, we replicated the methods by Pope et al

We obtained internal cost accounting data for the services the plan provided to beneficiaries; these data approximate (or are proportional to) total allowed charges. We then replicated the methods that Pope et al

Because the MA-HMOs’ geographic distribution of beneficiaries differs from the national TM distribution that Pope et al used in their calculations, absolute spending levels (and hence absolute values of weights) could differ for reasons that are incidental to our purposes herein, such as variation in nominal wage levels. Moreover, it is well known that there are geographic differences in the treatment of various conditions,

We proceeded by following the specification by Pope et al.

We compared our results for 2006 with the relative risk scores from the 2004 version of the CMS-HCC software, which CMS used for payment in 2006 and are based on the values by Pope et al.

We adopted the one-fifth cutoff for the descriptive statistic to represent a compromise between having enough HCCs to obtain a meaningful distribution of differences between MAHMO and TM values but also to be sufficiently precise that we did not show large deviations simply from sampling error, especially in the MA-HMO estimates; 46 HCCs in 2006 and 45 HCCs in 2007 satisfied this criterion.

If the equiproportionate assumption held, these differences would bunch around zero. Some differences could arise from differences in relative input prices that led to a different mix of inputs in MA-HMO treatment; other differences could arise from varying distributions of the demographic variables or illness severity within HCCs between TM and the MA-HMO sample. There is also a negligible discrepancy because the means of the TM and MA-HMO distributions were adjusted to be equal across the 81 coefficients rather than across the 46 (in 2006) and 45 (in 2007) used in the descriptive analysis. Nonetheless, it seems unlikely that differences from these causes would be large. However, differences in treatment patterns owing to utilization management could be large. Some of those differences could arise from input price or demographic differences, but most of them probably arise from differences between the incentives that full capitation offers and those of an unmanaged fee-for-service reimbursement environment, as well as differences in market power in contracting with providers.

As already noted, some differences between the 2 sets of weights would be expected from sampling error; even with more than 300,000 observations, some of the HCCs have only a few hundred observations in the MA-HMO data, and the values by Pope et al

Ignoring sampling error for the moment, if there were good agreement between the weights, almost all of the mass in the 2 histograms would be concentrated near zero because each distribution has approximately the same overall mean. As is readily apparent, this is not the case; there are substantial deviations between the MA-HMO cost accounting data and Medicare reimbursement in both years.

We want to formally test the hypothesis that the vector of the CMS-HCC coefficients equals the corresponding vector of values for the same HCCs estimated on MA-HMO data, and we must account for the sampling error to do so. Unfortunately, we only have the published CMS-HCC weights and their published standard errors from the study by Pope et al

Each CMS-HCC coefficient estimate (and corresponding MA-HMO coefficient estimate) is proportional to a conditional mean of the cost for a group of patients with a given diagnosis configuration and, by the central limit theorem, is approximately normally distributed. After adjusting the means of the 2 distributions of estimated coefficients to be equal, we calculate the difference between each of the corresponding HCC coefficients (eg, the coefficient for HCC1 from the study by Pope et al

In symbols, let the estimated coefficient for HCCi in the MA-HMO sample be ai and the corresponding estimated coefficient in TM be bi (i = 1 – 81). Rescale the distribution of the ai to have the same mean as the distribution of the bi. The distribution of each (ai – bi) / [sqrt(sai 2 sbi 2)] is approximately N(0,1), so Σi,1 – 81[(ai – bi) / sqrt(sai 2 sbi 2)]2 = c2 81.

The resulting test statistics for the 2006 and 2007 distributions have c2 statistics of 2546 and 5299, respectively, both of which have

^{10}; we annualized the decedent’s spending by multiplying by 12, divided by the number of months eligible, and then weighted the observation by the reciprocal of that ratio. We excluded the small number (3%-4%) who did not have coverage for both Parts A and B or who left the plan in the middle of the year. Excluding the nondecedents with less than a full year of eligibility avoided the issue of how best to annualize spending for beneficiaries who disenrolled because they left the service area or changed plans in the middle of a calendar year and had missing spending data.We obtained internal cost accounting data for the services the plan provided to beneficiaries; these data approximate (or are proportional to) total allowed charges. We then replicated the methods that Pope et al

^{10 }used to derive the CMS-HCC weights, substituting the plan’s cost accounting data at the beneficiary level for the TM spending data that Pope et al used. We compared the resulting coefficients of the HCCs with the coefficients of the CMS-HCC from the model by Pope et al that CMS used to reimburse plans. To simplifycomparison of the distribution of MA-HMO values with the values in the study by Pope et al, we rescaled the distribution of MA-HMO values to have the same mean as the distribution of the values estimated from TM. Rescaling (ie, normalizing the weights) corrects for any factor that affects all HCCs proportionately.Because the MA-HMOs’ geographic distribution of beneficiaries differs from the national TM distribution that Pope et al used in their calculations, absolute spending levels (and hence absolute values of weights) could differ for reasons that are incidental to our purposes herein, such as variation in nominal wage levels. Moreover, it is well known that there are geographic differences in the treatment of various conditions,

^{19}an additional reason why there could be differences in relative costs by geography. We make no effort to reweight our data to match the TM geographic distribution but note that, from the point of view of Medicare policy, any geographic differences in relative costs simply add to any distortions in the current structure because the structure of the CMS-HCC is the same across all regions.We proceeded by following the specification by Pope et al.

^{10}We regressed the annual accounting cost for each beneficiary in the MA-HMO sample on the following dummy variables: age and sex (24 categories), Medicaid-×-sex-×-disability (4 categories), HCC (70 categories), HCC-×-disability (5 categories), and HCC interactions (6 categories). This gave a total of 81 HCC-related coefficients that we compared with those in the study by Pope et al. After initial estimation, we constrained several of the MA-HMO coefficients such that categories with a higher ranking in the disease hierarchy would have at least as high predicted costs. Specifically, we constrained the coefficients of the following HCCs to be equal: hcc008 = hcc009, hcc067 = hcc068, hcc081 = hcc082, hcc107 = hcc108, hcc075 = hcc154, and hcc161 = hcc177. These coefficients should be monotonically ordered but were not in our sample, presumably because of small numbers. Our specification differed from that by Pope et al in a minor aspect: because we were unable to determine which of the older beneficiaries may have been eligible for Medicare before age 65 years because of disability, we did not estimate intercept terms corresponding to older beneficiaries who were disabled before age 65 years and those who were not. Specifically, Pope et al included interaction terms for originally disabled and sex (originally_disabled–×-female and originally_disabled–×male). Instead, our estimates effectively are the average intercept over all older persons. Because most older beneficiaries were not eligible before age 65 years for reasons of disability (about 15% of all Medicare beneficiaries are younger than 65 years at any point in time) and because those who were eligible because of disability rather than by becoming 65 years old are distributed throughout the HCCs, any bias from this difference should not materially affect our results.We compared our results for 2006 with the relative risk scores from the 2004 version of the CMS-HCC software, which CMS used for payment in 2006 and are based on the values by Pope et al.

^{10}The values that CMS used in 2005 and 2006 were based on a 5% sample of 1999 and 2000 data from TM. We similarly compared our 2007 results with the results from the 2007 CMS-HCC software. As a descriptive comparison of the risk adjustment structure derived from the MA-HMO data with that derived from the TM data, we computed the percentage differences between the values derived from the MA-HMO data and from the TM data for those HCCs whose values in the MA-HMO data were estimated with a sufficient degree of precision (specifically, whose standard error was <20% of the absolute difference between the weights from TM and from the MA plans, divided by the mean of these 2 values). For this purpose, werescaled the MA-HMO values to have the same mean as the TM values over all 81 coefficients.We adopted the one-fifth cutoff for the descriptive statistic to represent a compromise between having enough HCCs to obtain a meaningful distribution of differences between MAHMO and TM values but also to be sufficiently precise that we did not show large deviations simply from sampling error, especially in the MA-HMO estimates; 46 HCCs in 2006 and 45 HCCs in 2007 satisfied this criterion.

If the equiproportionate assumption held, these differences would bunch around zero. Some differences could arise from differences in relative input prices that led to a different mix of inputs in MA-HMO treatment; other differences could arise from varying distributions of the demographic variables or illness severity within HCCs between TM and the MA-HMO sample. There is also a negligible discrepancy because the means of the TM and MA-HMO distributions were adjusted to be equal across the 81 coefficients rather than across the 46 (in 2006) and 45 (in 2007) used in the descriptive analysis. Nonetheless, it seems unlikely that differences from these causes would be large. However, differences in treatment patterns owing to utilization management could be large. Some of those differences could arise from input price or demographic differences, but most of them probably arise from differences between the incentives that full capitation offers and those of an unmanaged fee-for-service reimbursement environment, as well as differences in market power in contracting with providers.

As already noted, some differences between the 2 sets of weights would be expected from sampling error; even with more than 300,000 observations, some of the HCCs have only a few hundred observations in the MA-HMO data, and the values by Pope et al

^{10}have a sample only about 6 times as large (Pope et al used a 5% sample of 2 years of data), so that sampling error is also relevant to the values by Pope et al. We describe herein a formal test across all the coefficient values of whether the relative price structures are similar.**RESULTS****Figure 1**and**Figure 2**show the results of the percentage differences between the weights derived from TM and from the MAHMO sample for those HCCs in which the standard error of the MA-HMO values is less than 20% of the difference between the TM and MA-HMO coefficients (46 HCCs in 2006 and 45 HCCs in 2007).**Table 1**and**Table 2**give values for these HCCs.Ignoring sampling error for the moment, if there were good agreement between the weights, almost all of the mass in the 2 histograms would be concentrated near zero because each distribution has approximately the same overall mean. As is readily apparent, this is not the case; there are substantial deviations between the MA-HMO cost accounting data and Medicare reimbursement in both years.

We want to formally test the hypothesis that the vector of the CMS-HCC coefficients equals the corresponding vector of values for the same HCCs estimated on MA-HMO data, and we must account for the sampling error to do so. Unfortunately, we only have the published CMS-HCC weights and their published standard errors from the study by Pope et al

^{10}; we do not have the covariance terms with the demographic variables for the CMS-HCC weights. (The HCCs themselves are orthogonal.) Because we do not have raw TM claims data, to carry out a formal test, we must make the following 3 assumptions that are known not to hold exactly but may hold approximately: (1) The MA-HMO and TM coefficients areindependent. (2) The estimated variances have no sampling error; that is, we use the published standard errors as if they were true. (3) The covariances between the demographic variables and the HCC dummy variables are ignorable.Each CMS-HCC coefficient estimate (and corresponding MA-HMO coefficient estimate) is proportional to a conditional mean of the cost for a group of patients with a given diagnosis configuration and, by the central limit theorem, is approximately normally distributed. After adjusting the means of the 2 distributions of estimated coefficients to be equal, we calculate the difference between each of the corresponding HCC coefficients (eg, the coefficient for HCC1 from the study by Pope et al

^{10}and the coefficient for HCC1 from the MA-HMO data). The distribution of the difference between the 2 vectors of means is also normally distributed, and each element has a mean of zero under the null. Assuming independence, the variance of the difference in each element of the HCC and MA-HMO means is simply the sum of the variance of each mean. Therefore, dividing each estimated difference in the 2 means by the square root of the sum of the variances gives a standardized N(0,1) variable. The sum of those variables squared over the 81 coefficients for 2006 (and similarly for 2007) is then distributed as c2 81.In symbols, let the estimated coefficient for HCCi in the MA-HMO sample be ai and the corresponding estimated coefficient in TM be bi (i = 1 – 81). Rescale the distribution of the ai to have the same mean as the distribution of the bi. The distribution of each (ai – bi) / [sqrt(sai 2 sbi 2)] is approximately N(0,1), so Σi,1 – 81[(ai – bi) / sqrt(sai 2 sbi 2)]2 = c2 81.

The resulting test statistics for the 2006 and 2007 distributions have c2 statistics of 2546 and 5299, respectively, both of which have

*P*< .005. In short, subject to the approximations we have made in assuming that the statistics we calculated are distributed as c2, we can overwhelmingly reject the null that the relative price structures of the MA-HMO and TM are the same. Eighty-one is the number of estimated coefficients, including the interaction terms that use HCC dummy variables. The critical value for a c2 81 distribution at*P*= .005 is 126, more than an order of magnitude less than the test statistics we obtained.