A Comparison of Comorbidity Measurements to Predict Healthcare Expenditures

, ,
The American Journal of Managed Care, February 2006, Volume 12, Issue 2

Objective: To compare the performance of the Elixhauser, Charlson, and RxRisk-V comorbidity indices and several simple count measurements, including counts of prescriptions, physician visits, hospital claims, unique prescription classes, and diagnosis clusters.

Study Design: Each measurement was calculated using claims data during a 1-year period before the initial filling of an antihypertensive medication among 20 378 members of a managed care organization. The primary outcome variable was the log-transformed sum of prescription, physician, and hospital expenditures in the year following the prescription encounter.

R2

Methods: In addition to descriptive statistics and Spearman rank correlations between measurements, the predictive performance was determined using linear regression models and corresponding adjusted statistics.

R2

R2

R2

R2

R2

R2

R2

R2

Results: The Charlson index and the Elixhauser index performed similarly (adjusted = 0.1172 and 0.1148, respectively), while the prescription claims-based RxRisk-V (adjusted = 0.1573) outperformed both. An age- and gender-adjusted regression model that included a count of diagnosis clusters was the best individual predictor of payments (adjusted = 0.1814). This outperformed age- and gender-adjusted models of the number of unique prescriptions filled (adjusted = 0.1669), number of prescriptions filled (= 0.1573), number of physician visits (adjusted = 0.1546), log-transformed prior healthcare payments (adjusted = 0.1359), and number of hospital claims (adjusted = 0.1115).

Conclusion: Simple count measurements appear to be better predictors of future expenditures than the comorbidity indices, with a count of diagnosis clusters being the single best predictor of future expenditures among the measurements examined.

(Am J Manag Care. 2006;12:110-117)

Comorbidity scores are a common tool used by researchers in epidemiological and health services research. Comorbidities are defined as coexisting medical conditions that are distinct from the primary condition under investigation.1 Interest in comorbidity scores can be attributed to the importance of relationships between comorbidities and the prognosis, detection, and outcomes of many illnesses.2 In studies using secondary administrative data, the absence of randomization to treatment and control groups can result in health status differences across groups. This can potentially confound the relationship between a treatment and disease under investigation.

International Classification of Diseases, Ninth Revision,

Clinical Modification (ICD-9-CM)

Numerous comorbidity controls exist, including -based measures such as the Elixhauser and Charlson indexes3-7 and pharmacy claims-based measures such as the chronic disease score.8-12 Simple variables used to measure comorbidity, such as counts of medications, physician visits, or medical conditions, also have been used in research. These measurements are less complex to use, and studies13,14 have shown them to be as effective (if not more effective) as other measurements in predicting and controlling for comorbidity. Furthermore, these measurements are not subject to biases related to misclassification. Misclassification can arise in comorbidity indexes if complications (conditions arising from the treatment or progression of a condition) are coded as comorbidities (conditions existing simultaneously with and independently of other medical conditions).

Several studies13-15 have compared the predictive validity of commonly used comorbidity scores. The focus of most of these comparisons, however, has been on the prediction of mortality and morbidity and not on healthcare expenditures. Increasingly, comorbidity scores are being used to control for comorbidity differences in studies16-19 in which expenditures and payments are the primary dependent variables. Because most comorbidity measures were developed to predict mortality or morbidity, potential differences may arise when these measures are applied to expenditure outcomes. For example, although patients not surviving the initial onset of an acute myocardial infarction would be included as deaths in mortality estimates, they may not have significant healthcare expenditures because of their deaths. In this case, measures that are good predictors of mortality may not necessarily predict expenditures well.

Comorbidity indexes have been used to argue for differences in capitation payment rates.20,21 Of particular concern in capitation arrangements is that a small fraction of insurance beneficiaries often accounts for a large portion of healthcare expenditures. These high-expenditure individuals may also be of interest to insurers wishing to target disease management programs to control health plan spending. Although comorbidity indexes may be useful in predicting high-expenditure individuals, a comparison of measurements has not been undertaken to date, to our knowledge. Therefore, the use of comorbidity measures for these objectives may be in question.

ICD-9-CM

Given the disparate comorbidity measures in use and the absence of studies comparing their performance in analyses with expenditure outcomes, this study was undertaken to compare the performance of different comorbidity measures in predicting individual healthcare expenditures. Specifically, the performance of the following 3 administrative claims-based comorbidity scores was assessed: the -based Charlson and Elixhauser indexes and the pharmacy claims-based RxRisk-V score. Furthermore, the performance of these scores was compared with that of simpler comorbidity measurements, including counts of prescriptions, physician visits, hospital claims, and unique prescription classes, as well as the cumulative number of diagnosis clusters, serving as a proxy for the count of unique health conditions.

METHODS

Data Source

ICD-9-CM

For this analysis, we used hospital, physician, and pharmacy claims data from a large managed care organization. The study population included 20 378 individuals 18 years and older who had claims for a diagnosis of hypertension (codes 401-404, 362.11, or 437.2) and who obtained an initial prescription fill for an antihypertensive medication between January 1, 2001, and December 31, 2002. The index date in this study was defined as the original prescription purchase date. The preperiod and the postperiod include the periods 1 year before and 1 year after the index date, respectively. Individuals with gaps in enrollment longer than 31 days during the preindex and the postindex periods were excluded.

ICD-9-CM

-based Comorbidity Indexes

Charlson Index.

ICD-9-CM

The Charlson index is the most common index used to control for comorbidity in health outcomes studies. The original Charlson index was developed for use with medical records and consisted of 19 different diseases weighted according to disease severity as 1, 2, 3, or 6. The index has since been adapted into several 17-item weighted indexes for use with administrative data.1,3,6,22,23 A comprehensive comparison performed by Schneeweiss et al13 examined differences in the predictive ability of several Charlson indexes of mortality, long-term care admissions, hospitalizations, physician visits, and expenditures for physician services. Results from that study showed little difference in the performance of different Charlson indexes, with the adaptation by Romano et al23 performing best. We used a modified version of the Romano-adapted Charlson index to accommodate changes in coding. Based on previous investigations that suggest that adding physician claims to hospital claims increases the performance of the Charlson index, we ran the index first using preperiod hospital claims alone and then using preperiod hospital and physician claims.24

Elixhauser Index.

A newer comorbidity measurement is the index by Elixhauser et al.5 The Elixhauser index measures the effect of 30 different comorbid conditions. The index distinguishes comorbidities from complications by considering only secondary diagnoses unrelated to the principal diagnosis through the use of diagnosis related groups (DRGs). For example, a patient with a claim for congestive heart failure would have this condition coded as a comorbidity only if the medical record did not contain a DRG for cardiac disease. Current coding for the Elixhauser index was downloaded from the Agency for Healthcare Research and Quality.25 The index was run first using preperiod hospital claims alone and then using preperiod hospital and physician claims. Although DRGs are not available within physician claims, it was thought that many comorbid conditions would be missed if these data were not included. The final Elixhauser scores were calculated as the sum of comorbid conditions present. Hypertension was excluded from the final score because of the disease population studied.

Prescription Claims-based Comorbidity Index

Several indexes commonly referred to as chronic disease scores have been developed for use with pharmacy claims data.8,11,12,26 The most recent modifications to the chronic disease scores are the RxRisk score for use among general populations8 and the RxRisk-V score for use among Veterans Affairs populations.11 Although developed for Veterans Affairs populations, the RxRisk-V score was deemed more applicable to our study population based on the population's age and disease distribution. Coding of the RxRisk-V was completed using the medication classes provided in the original article and the corresponding Medi-Span codes.11 The RxRisk-V identifies 45 distinct comorbid conditions by linking them to medications used during treatment. We used nonweighted and weighted counts of RxRisk-V conditions. Weights for the RxRisk-V were taken directly from the originally published prospective cost coefficient estimates.11

Simple Count Measurements

Specific counts of healthcare utilization were performed in this study. These included a 12-month count of physician visits, hospital claims, prescriptions filled, and unique prescriptions (classes of prescriptions) filled during the 1-year preperiod.

Diagnosis Clustering Measurement

ICD-9-CM

ICD-9-CM

As a proxy for the number of unique medical conditions, we categorized conditions into diagnosis clusters and used a count of diagnosis clusters as a summary score.27 Coding of conditions into 1 of 119 unique diagnosis clusters was performed using claims data. Individuals with claims not identified in diagnosis clustering were categorized as having an "other" diagnosis cluster. Hypertension was excluded as a cluster because of the population studied. Diagnosis clusters were implemented using hospital claims, as well as both hospital and physician claims.

Prior Expenditure Measurements

The final measurements used as potential predictors of postperiod payments were a sum of preperiod payments and a sum of log-transformed preperiod payments. To avoid log-transformation errors, $0.01 was added to preperiod payments before transformation.

Dependent Variable

The primary outcome in this study was the sum of individual healthcare expenditures, defined as the sum of hospital, physician, and prescription payments in the 1-year postperiod. Payments included the amount paid out by the health plan to the provider, any amount paid by the patient (including deductibles and copayments), ancillary payments, and any amount reserved from the health plan. Log transformation of the dependent variable was performed to account for nonnormality.

Statistical Analysis

R2

R2

df.

R2

Descriptive statistics of population characteristics were performed, including means, standard deviations, and ranges for demographic, payment, healthcare utilization, and comorbidity variables. We assessed the correlations between each comorbidity measurement using Spearman rank correlations (?). Spearman rank correlations were used to account for potential nonnormality bias in the independent variables. The performance of each comorbidity measurement was assessed through ordinary least squares linear regressions. Adjusted values were reported as an informal comparison of the prediction performance to adjust for the number of explanatory variables in each regression model. The adjusted value should be thought of as an index value of variance that is corrected for Higher adjusted values correspond to improved model fit and greater predictive ability. To examine the performance of measurements in predicting high expenditures, we used area under the receiver operating characteristic (ROC) curve comparisons. Area under the ROC curve comparisons assess the ability of each measurement to accurately predict true-positive cases while not predicting true-negative cases. Area under the ROC curve outcomes are analyzed through the use of C statistics, which can range from 0.5 (representing no predictive ability) to 1.0 (representing perfect predictive ability). The area under the ROC curve outcome was dichotomized as 0 or 1, with 1 representing high-expenditure individuals who spent at or above the 90th percentile among the study population.

RESULTS

Descriptive Statistics

Full descriptive statistics for the 20 378 individuals who comprised our study population are given in Table 1. Population members were on average 49 years of age. There were slightly more men (53%) than women. In the year before filling a new prescription for an antihypertensive medication, individuals averaged 1.7 hospital claims, 10.4 physician visits, 13.2 prescriptions, and 4.7 unique prescription medications used. The mean count of diagnosis clusters identified through preperiod hospital and physician visit claims was 7.1, compared with a mean of 2.0 diagnosis clusters identified through hospital claims alone. The mean comorbidity score for each diagnosis-based index was greater when physician claims were combined with hospital claims. Mean scores ranged from 0.55 for the Charlson index, 0.61 for the Elixhauser index, 1.98 for the nonweighted RxRisk-V score, and 4111 for the weighted RxRisk-V score. The mean payment was $4615 in the preperiod, and the mean payment was $6301 in the postperiod. Most payments incurred in the postperiod were hospital payments ($2756), followed by physician payments ($2053) and prescription payments ($1492). The cutoff for 90th percentile spending was $12 945. The mean postperiod spending was $3320 among individuals below the 90th percentile cutoff, and the mean postperiod spending was $33 156 among individuals above the 90th percentile cutoff.

Correlations

ICD-9-CM

ICD-9-CM

The strength of the correlations between the different indexes varied across types of measurements, as summarized in Table 2. The correlation between the 2 claims-based indexes (the Elixhauser and Charlson indexes) was fair (? = 0.562) in this analysis. However, the correlation between the and pharmacy claims-based indexes was small, with the correlations between the nonweighted RxRisk-V score and Elixhauser index being slightly better (? = 0.301) than those between the nonweighted RxRisk-V score and Charlson index (? = 0.242). Among the count measurements analyzed, there was a high correlation between the hospital claims-identified diagnosis clusters and number of hospital claims (? = 0.932) and between the hospital and physician claims-identified diagnosis clusters and number of physician visits (? = 0.820). The nonweighted RxRisk-V score was strongly correlated with the number of prescriptions used (? = 0.806) and the number of unique prescriptions used (? = 0.848). The correlations were fair between the counts of hospital and physician visits (? = 0.560) and counts of prescription fills and physician visits (? = 0.555). However, the correlations were small across the counts of prescriptions filled and hospital counts (? = 0.327).

Predictive Performance

R2

R2

R2

R2

ICD-9-CM

R2

R2

R2

R2

R2

R2

The predictive performance of each comorbidity measurement on postperiod payments is given in Table 3. The amount of variation explained by each measurement increased without exception when age and gender were included as predictors. Similarly, the addition of physician claims to hospital claims increased the predictive performance of the Charlson and Elixhauser indexes. The full Charlson index (which used hospital and physician claims, as well as adjustments for age and gender) and the full Elixhauser index performed similarly (adjusted = 0.1172 and 0.1148, respectively). Among the 2 RxRisk-V scores, the nonweighted score (adjusted = 0.1381) was a better predictor than the weighted score (adjusted = 0.1261). Compared with the age-and gender-adjusted RxRisk-V (adjusted = 0.1573), both -based indexes appear to be slightly inferior. The regression model that included a count of diagnosis clusters, age, and gender was the best individual predictor of future payments (adjusted = 0.1814). This outperformed age- and gender-adjusted models of counts of unique prescriptions filled (adjusted = 0.1669), prescriptions filled (adjusted = 0.1573), physician visits (adjusted = 0.1546), log-transformed prior healthcare payments (adjusted = 0.1359), and hospital claims (adjusted = 0.1115).

ICD-9-CM

R2

R2

R2

R2

R2

R2

To examine the effect of adding simple healthcare utilization counts to comorbidity indexes, we added simple count controls to the Charlson, Elixhauser, and RxRisk-V models. For both of the -based indexes, the addition of a count of unique prescriptions increased the explained variation the greatest. Adding a count of unique prescriptions increased the amount of explained variation 43% in the Charlson index (adjusted = 0.2050) and 42% in the Elixhauser index (adjusted = 0.1967) (Table 3). For the RxRisk-V score, the addition of counts for numbers of physician visits and hospital visits had the greatest effect on explained variation, increasing adjusted values 22% (adjusted = 0.2014) and 16% (adjusted = 0.1871), respectively. The addition of a count of prescriptions to diagnosis clusters was the best combination control for comorbidities in our study (adjusted = 0.2190).

Results from the area under the ROC curve analysis are shown in Table 4. C statistic values less than 0.7, between 0.7 and 0.8, and greater than 0.8 are generally considered poor, fair, and excellent, respectively. None of the C statistic values in our analysis exceeded 0.7, indicating poor performance in predicting high expenditures. The best performing scores for predicting high expenditures appear to be counts of physician visits (C = 0.6927), diagnosis clusters (C = 0.6897), and hospital claims (C = 0.6756).

DISCUSSION

ICD-9-

CM

Simple count measurements were the best individual predictors of future healthcare expenditures in this study. The 2 best performing individual measurements were the age- and gender-adjusted models of a count of diagnosis clusters, followed by a count of unique prescriptions. To our knowledge, this study is the first to evaluate the number of diagnosis clusters as a predictor of future expenditures. Findings from other studies support the superior performance of counts of prescription utilization in predicting future expenditures. Schneeweiss et al13 showed that the number of distinct medications used during a 1-year baseline period best predicted future expenditures in a sample of patients 65 years and older. Similarly, Perkins et al14 showed that the number of pharmacy subclasses and the number of medications used were better predictors of future expenditures than prescription and -based comorbidity indexes. Our study builds on this previous research by comparing several different count measurements, including a proxy for the number of unique medical conditions, diagnosis clusters.

One reason why simple count measurements performed best in our study is the possibility that these measurements more accurately reflect the severity of a disease state by capturing the intensity of resource utilization. As an example, although a person would be identified only as having hypertension in a comorbidity index, a count of the number of medications used to treat hypertension might better reflect the intensity of treatment and the potential severity of the condition. Some comorbidity indexes, such as the Charlson index, weight certain conditions to give them more statistical importance in the final score. However, although weights can capture the severity between disease states, Predicting Healthcare Expenditures they do not capture the gradation underlying the severity within a disease state.

The simple count measurements that reflected the number of unique medical conditions outperformed other simple count measurements in our study. For example, a count of unique medications predicted healthcare expenditures better than a simple count of prescription fills, which included the original fill and any refills. Similarly, a count of diagnosis clusters performed better than a simple count of physician or hospital visits. This suggests that the number of distinct medical conditions that an individual had was more likely to affect future expenditures than prior utilization.

ICD-9-CM

ICD-9-CM

Among the specific comorbidity indexes examined, the RxRisk-V score outperformed the Charlson and Elixhauser indexes. This confirms prior comparisons of different Charlson indexes and prescription claims scores in predicting expenditure outcomes.13,14 Furthermore, it expands on this research to include the Elixhauser index as being potentially inferior to prescription claims-based comorbidity scores in predicting expenditures. In studies13,14 comparing the performance of -based indexes with that of prescription claims-based scores in predicting mortality and morbidity, the -based scores outperformed the prescription claims-based scores. This supports the hypothesis that the effect of comorbidity on expenditure outcomes is different than the effect of comorbidity when applied to mortality and morbidity outcomes.

The enhanced performance of the RxRisk-V score compared with the Elixhauser and Charlson indexes may have resulted in part from a difference in the utilization of healthcare services. The average person filled 13.2 prescriptions in the preperiod, compared with 1.7 hospital claims and 10.4 physician visits. This could have contributed to a higher mean RxRisk-V score (a score of 1.98) compared with the mean Elixhauser (a score of 0.61) and Charlson indexes (a score of 0.55).

Compared with simple count measurements, comorbidity indexes have several disadvantages besides lower predictive ability. These measurements were generally more difficult to administer, requiring the user to match claims to specific medical conditions. This could be problematic in the RxRisk-V, which constrains the user to classify a medication to one specific disease. For example, although individuals using lactulose were coded as having liver failure in the RxRisk-V, this medication also is commonly used to treat constipation. A specific disadvantage of matching claims to comorbidities in the Elixhauser index is that it requires DRGs, which are not always present in medical claims. This could result in potential misclassification of complications as comorbidities.

The addition of simple count measurements significantly improved the predictive performance of each comorbidity index. For the Elixhauser and Charlson indexes, the addition of a count of unique prescriptions increased the predictive performance better than the addition of counts of physician or hospital claims. For the RxRisk-V score, the addition of a count of physician visits increased the predictive performance better than the addition of prescription utilization measures. These findings suggest that adding information from claims sources other than that required in coding the index has the greatest effect on increasing the predictive performance of each index. The marginal benefit of including additional claims information should be weighed against the cost and administration of including additional data sources by individual researchers.

None of the measurements used in this analysis were effective at predicting 90th percentile spending. The best prediction of high expenditures was achieved through simple count measurements. The prior use of physician services was the best predictor of significant expenditures, followed by a count of diagnosis clusters and a count of hospital claims. Prescription claims information was not as accurate in predicting high expenditures as hospital and physician claims information. This may reflect differences in the cost of services. For example, hospital admissions are generally much more expensive than prescription medications. Therefore, individuals using physician and hospital services more frequently would incur higher expenditures.

ICD-9-CM

Our results should be interpreted in light of the following limitations. Because we used claims data for our analyses, information on services not billed to the insurance system was unavailable. One potential limitation related to the Elixhauser index is that DRG information was not available in the physician claims. This may have caused some misclassification of complications as comorbidities. The hospital claims used in this study had 9 diagnosis fields, and the physician claims had 4 diagnosis fields. This leaves the possibility that individual comorbidities were not identified. In coding each claims-based measurement, there exists the possibility that ruled out diagnoses that were assigned for billing purposes were misclassified as existing comorbidities.1 In coding the RxRisk-V score, some conditions that rely on claims for durable medical equipment, such as urinary incontinence and ostomy products, were not coded because claims did not exist in the data set. Finally, caution should be used when generalizing results beyond the study population of continuously enrolled hypertensive patients 18 years and older from a managed care organization.

CONCLUSIONS

This study builds on previous comorbidity comparisons to examine the effect of 3 indexes and several markers of healthcare utilization on the prediction of future healthcare expenditures. Among the age- and gender-adjusted models that were studied, a simple count of diagnosis clusters and the number of unique prescriptions used by an individual were the best predictors of future healthcare expenditures. Among the different comorbidity indexes examined, the age- and gender-adjusted RxRisk-V score outperformed the Charlson and Elixhauser indexes. Simple count measurements appear to be better at controlling for the effect of comorbidities than more elaborate comorbidity indexes in studies with expenditure outcomes.

From the Graduate Program in Social, Administrative and Clinical Pharmacy, College of Pharmacy, University of Minnesota, Minneapolis (JFF, JWD); Health Economics and Outcomes Research, i3 Magnify, Eden Prairie, Minn (CRH); and Allied Health Programs, Air Force Institute of Technology, Wright Patterson AFB, Ohio (JWD).

This study was supported by Pfizer Inc, New York, NY, through a dissertation fellowship (JFF). The views expressed by Capt Joshua W. Devine in this article are his and those of the other authors and do not reflect the official policy or position of the US Air Force, the US Department of Defense, or the US government.

Address correspondence to: Joel F. Farley, RPh, Graduate Program in Social, Administrative and Clinical Pharmacy, College of Pharmacy, University of Minnesota, 7-174 Weaver-Densford Hall, 308 Harvard Street SE, Minneapolis, MN 55455. E-mail: farl0032@umn.edu.