Published Online: May 10, 2013
John F. McAna, PhD; Albert G. Crawford, PhD; Benjamin W. Novinger, MS; Jaan Sidorov, MD; Franklin M. Din, DMD; Vittorio Maio, PharmD; Daniel Z. Louis, MS; and Neil I. Goldfarb, BA
Objectives: To identify Medicaid patients, based on 1 year of administrative data, who were at high risk of admission to a hospital in the next year, and who were most likely to benefit from outreach and targeted interventions.
Study Design: Observational cohort study for predictive modeling.
Methods: Claims, enrollment, and eligibility data for 2007 from a state Medicaid program were used to provide the independent variables for a logistic regression model to predict inpatient stays in 2008 for fully covered, continuously enrolled, disabled members. The model was developed using a 50% random sample from the state and was validated against the other 50%. Further validation was carried out by applying the parameters from the model to data from a second state’s disabled Medicaid population.
Results: The strongest predictors in the model developed from the first 50% sample were over age 65 years, inpatient stay(s) in 2007, and higher Charlson Comorbidity Index scores. The areas under the receiver operating characteristic curve for the model based on the 50% state sample and its application to the 2 other samples ranged from 0.79 to 0.81. Models developed independently for all 3 samples were as high as 0.86. The results show a consistent trend of more accurate prediction of hospitalization with increasing risk score.
Conclusions: This is a fairly robust method for targeting Medicaid members with a high probability of future avoidable hospitalizations for possible case management or other interventions. Comparison with a second state’s Medicaid program provides additional evidence for the usefulness of the model.
Am J Manag Care. 2013;19(5):e166-e174
Predictive models are powerful tools that can be used to estimate future healthcare costs and opportunities for interventions for individuals.
Administrative data can be successfully used to identify individuals for care management.
This study provides a robust method for developing a predictive model to identify these individuals.
The model is based on available data; most of the derived variables are relatively easy to generate from the data; and risk scores, either developed on a proprietary basis or open source, are easily incorporated into the model.
According to a report from the Kaiser Commission on Medicaid and the Uninsured, in 2011, due to the recession, there were 2 major factors driving state needs to control costs in their Medicaid programs: reduced state budgets and increased enrollment in programs. However, despite these economic pressures, a survey of Medicaid directors conducted by the commission showed a commitment to assuring access to high-quality care delivered in the most efficient manner possible.1 States want to improve the care and well-being of participants in their Medicaid programs and at the same time bring some control to rising program costs.
One method of meeting this commitment would be to identify those segments of the Medicaid population accounting for disproportionate percentages of the costs and within those groups to identify the highest risk members and intervene early in order to avoid unnecessary highcost care. In 2003, the elderly and disabled constituted approximately 25% of the Medicaid population. However, they accounted for about 70% of Medicaid spending that year: 43% by people with disabilities and 26% by the elderly. Only 30% of the spending was accrued by the remaining 75% of the Medicaid population.2
Predictive modeling may assist Medicaid plans in identifying program participants at highest risk of future health problems. According to Knutson and Bella,3 “Predictive models are data-driven, decision-support tools that estimate an individual’s future potential healthcare costs and/ or opportunities for care management.” Predictors can be derived from administrative data. This was an approach taken by Billings et al4 when developing a predictive model for the National Health Service in England to identify patients at high risk for rehospitalization. Claims and enrollment data are readily available to payers and can be used in models to target specific groups of interest and to provide risk scores for individuals. These scores could then be provided to case/care/disease managers to help them more readily identify those in need of their services.
A randomized trial conducted by Wennberg et al5 has shown that a targeted care management program can be successful in reducing medical costs and hospitalizations. Billings and Mijanovich6 showed that care management for chronic disease Medicaid patients who had been hospitalized could be cost-effective and could improve the health of this population. Of importance to Medicaid plans, they showed that existing data resources can be used to predict patients at greatest risk of future hospital readmissions within 12 months of an index admission.
These studies also stress the need for developing models and care management plans specific to Medicaid populations. Eligibility requirements, such as low income and/or disability, and other factors typically associated with this population (eg, homelessness, substance use, or low educational achievement) distinguish them from those populations typically covered by commercial plans and their vendors.
Hospitalizations are known to be high-cost events and are easily identifiable and categorized from claims and encounter data. It is also well known from the literature that patients with chronic diseases and multiple comorbidities are at high risk for hospitalization or rehospitalization.7 This situation should hold true regardless of which state Medicaid plan is under study. For these reasons, developing a model predictive of hospitalization for patients with chronic diseases and multiple comorbidities would provide the best opportunity for targeting patients for case/care management that could reduce avoidable costs and be generalizable across states.
In this article, we describe the development of a model to predict hospitalizations among enrollees identified as disabled in a state Medicaid program. Its purpose was to identify Medicaid patients, based on 1 year of administrative data, at high risk of admission to a hospital in the next year and most likely to benefit from outreach and targeted interventions. Previous studies have examined a similar population, but specific to readmissions,6 or have looked at specific diagnoses.8 Applying the model to Medicaid data from a second state supports the generalizability of the model to other programs in other states.
Claims and Enrolment Data
Data for a 2-year period (2007 and 2008) were extracted from a data mart containing a subset of membership and claims information for all Medicaid enrollees in 1 state Medicaid program. (The state was in the southern part of the United States; however, contractual agreements prevent the authors from identifying the actual state studied.) Data extracted for 2007 (measurement year) were used to derive the predictors for the model; outcomes were derived from 2008 (prediction year) data.
The claims experience of the eligible members included information from all measurement year claims/encounters. Inpatient, outpatient, professional, and pharmacy claims were used to obtain predictors based on utilization and diagnosis. Eligibility and enrollment files were used to establish study eligibility and to provide demographic predictor variables.
The Medicaid population is composed of numerous subpopulations defined by the state’s eligibility categories and benefits structures. It would be inappropriate to develop 1 model based on the entire eligible population. Each group has its own characteristics, risk factors, and outcomes.
The population chosen for this study included disabled enrollees who were fully covered by Medicaid and were continuously enrolled for both measurement year and prediction year. Members were identified as disabled if they were enrolled in one of the aid categories defined by the state for the disabled.
The disabled were chosen because they comprised a large portion of the enrolled Medicaid members for the state and were more likely than most other enrollees to be continuously enrolled for at least 2 years. In 2009, for the state under study, the disabled comprised 19.2% of the Medicaid population and accounted for 44.4% of the costs. Fully covered, continuously enrolled members were chosen because of the need for the most complete claims picture possible. Those enrollees with full Medicaid coverage were chosen to reduce loss of information due to incomplete Medicare claims data from the states involved.
Logistic regression was used to provide predicted probabilities for the occurrence of an inpatient hospital stay for individuals. The regression was performed in Stata and a stepwise process was used for including variables in the model (P was set at .05 for inclusion). The model was specified in a prospective manner. Demographic, utilization, diagnosis, and prescription drug data for 2007 were used to predict hospitalizations in 2008. The coefficients for the most powerful (ie, statistically significant) variables were used.
The dependent variable was the occurrence of an inpatient stay in the prediction year. Inpatient stays were identified by the occurrence of a valid, paid inpatient claim. Admissions due to major trauma or pregnancy were omitted, as these were felt to be unpredictable from the data and less amenable to intervention. Trauma that could be treated outpatient or through an emergency department (ED) was not addressed in this study. The independent variables used in the model included the following:
1. Inpatient stay(s) in the measurement year
2. Total length of stay (in days)
3. Primary/preventive office visit in the measurement year
7. Charlson Comorbidity Index
8. Chronic disease score
9. Mental health diagnosis in the measurement year (substance abuse)
10. Mental health diagnosis in the measurement year (other than substance abuse)
11. ED visit in the measurement year
12. Polypharmacy (8 or more different drugs prescribed in the measurement year)
13. The disease categories included in the chronic disease score (cystic fibrosis, end-stage renal disease [ESRD], human immunodeficiency virus [HIV], anxiety and tension, asthma, bipolar disorder, cardiac disease, coronary/peripheral vascular, depression, diabetes, epilepsy, gastric acid disorder, glaucoma, heart disease/hypertension, hyperlipidemia, hypertension, inflammatory bowel disease, liver failure, malignancies, pain, pain with inflammation, Parkinson’s disease, psychotic illness, renal disease, rheumatoid arthritis, thyroid disorder, transplant, and tuberculosis)
The model was developed by using a 50% sample of claims data for 1 state. The model was validated in 2 ways. It was tested against the other 50% of the eligible population and also against a second state (in the Midwest) to evaluate its generalizability. Stepwise logistic regressions were run separately for each of the samples and the results were compared. Variables were included and excluded based on their significance level in relation to the other variables included in the model. Because the chronic disease score (CDS) and Charlson Comorbidity Index (CCI) and individual disease categories were included in the model, multicollinearity was a possibility. The standard errors of the parameter estimates were examined to determine if multicollinearity existed. Multicollinearity was a possibility if any of the standard errors were large (ie, over 2). All of the standard errors were much lower than 2.
The performance of the model was evaluated using the receiver operating characteristics (ROC) curve. Based on data from the first 12-month period, the model also assigned scores reflecting each member’s risk of hospitalization in the second 12-month period.
PDF is available on the last page.