A statistical model based entirely on claims data can accurately predict 30-day hospital readmission in Medicaid patients with diabetes.
Objectives: Readmission is common and costly for hospitalized Medicaid patients with diabetes. We aimed to develop a model predicting risk of 30-day readmission in Medicaid patients with diabetes hospitalized for any cause.
Study Design: Using 2016-2019 Medicaid claims from 7 US states, we identified patients who (1) had a diagnosis of diabetes or were prescribed any diabetes drug, (2) were hospitalized for any cause, and (3) were discharged to home or to a nonhospice facility. For each encounter, we assessed whether the patient was readmitted within 30 days of discharge.
Methods: Applying least absolute shrinkage and selection operator variable selection, we included demographic data and claims history in a logistic regression model to predict 30-day readmission. We evaluated model fit graphically and measured predictive accuracy by the area under the receiver operating characteristic curve (AUROC).
Results: Among 69,640 eligible patients, there were 129,170 hospitalizations, of which 29,410 (22.8%) were 30-day readmissions. The final model included age, sex, age-sex interaction, past diagnoses, US state of admission, number of admissions in the preceding year, index admission type, index admission diagnosis, discharge status, length of stay, and length of stay–sex interaction. The observed vs predicted plot showed good fit. The estimated AUROC of 0.761 was robust in analyses that assessed sensitivity to a range of model assumptions.
Conclusions: Our model has moderate power for identifying hospitalized Medicaid patients with diabetes who are at high risk of readmission. It is a template for identifying patients at risk of readmission and for adjusting comparisons of 30-day readmission rates among sites or over time.
Am J Manag Care. 2023;29(8):e229-e234. https://doi.org/10.37765/ajmc.2023.89409
Early readmission of discharged Medicaid patients with diabetes is burdensome for patients and costly for the health system.
Diabetes afflicts roughly 13% of the US population.1 These patients have elevated rates of hospital admission2 and all-cause mortality,3 account for nearly one-third of all hospital discharges, and are twice as likely to be readmitted within 30 days of discharge as similar patients without diabetes.4,5 The estimated cost of 30-day readmissions for this population is $20 billion to $25 billion annually.4 Medicaid insurance is a consistent predictor of 30-day readmission.5
Various authors have developed models predicting readmission for patients with diabetes to aid in identifying those at elevated risk and designing interventions to improve their postdischarge management.6 Several such models exist for Medicare patients with diabetes,6-9 but none exists for Medicaid patients with diabetes.
The Medicaid program, jointly funded by federal and state governments,10 is the single largest source of health insurance in the United States, covering nearly 76 million American children, pregnant women, low-income adults, and individuals with disabilities.11,12 Given the scope of Medicaid and the distinctive characteristics of the covered population,13 there is a need for novel readmission models designed specifically for use in it.14,15 We address this gap by using Medicaid claims data to develop a risk model predicting 30-day readmission among individuals with diabetes hospitalized for any cause.
We retrieved Medicaid claims through a system operated by Digital Health Cooperative Research Centre, an Australian health care research organization, and Gainwell Technologies (formerly HMS), a US health care analytics company that coordinates Medicaid benefits in several states.
We extracted data from the claims, eligibility, and provider databases. The claims database contains files comprising institutional, medical, pharmacy, and dental data for each patient encounter. The institutional file includes information on hospitals and other facilities where the encounters took place. It also includes encounter-specific provider data and patient data such as presenting diagnoses and conditions (represented by International Classification of Diseases, Tenth Revision [ICD-10] codes). The eligibility database contains patient demographics, eligibility criteria, and coverage dates. The provider database includes information on medical service providers.
We considered Medicaid claims from 7 US states served by Gainwell Technologies: fee-for-service claims from Florida; Medicaid managed care claims from Georgia, Indiana, Kentucky, and Ohio; and claims of both types from Colorado and Nevada. We had access to claims dated from January 1, 2016, to June 21, 2019, in Florida; to July 1, 2019, in Ohio; to July 26, 2019, in Nevada; and to August 1, 2019, in the other states.
We used ICD-10 codes to identify patients with nongestational diabetes from 3 sources: admission ICD-10 codes in hospitalization claims, clinical diagnosis ICD-10 codes in physician and hospitalization claims, and prescriptions for diabetes drugs in pharmacy claims. In the physician and hospitalization claims, we defined a patient as having diabetes if any diagnosis field contained an ICD-10 diabetes code (E08-E13). In prescription data, we searched pharmacy claims for National Drug Code indicators that represent diabetes drugs (eAppendix Table 1 [eAppendix available at ajmc.com]). Thus, our cohort included any patient who had a claim indicating diagnosis or treatment of nongestational diabetes. This “opportunistic” case identification method is known to have high sensitivity and low specificity.16
An admission was any claim that met the following criteria:
We limited our cohort to admissions from January 1, 2017, or later, giving us at least 1 year of patient history preceding every hospitalization. We identified all admissions for each patient with diabetes, considering each a potential index admission. We then excluded as index admissions any admission for which the discharge code indicated in-hospital death or transfer to a short-term general hospital, another type of institution not defined in the Medicaid code list, a critical access hospital, or hospice care. An admission was designated as a readmission if it occurred within 30 days of the discharge of a previous admission. A readmission could serve as the index hospitalization for a subsequent readmission, except as stipulated above.4
Our list of potential predictors of readmission appears in Table 1. We used ICD-10 categories to group past, admission, and main diagnoses into a coarser list of variables. A past diagnosis is an ICD-10 code in a diagnosis field from a prior (to the index) hospitalization or physician visit. An admission diagnosis is an ICD-10 code reported in the admission diagnosis field of the index admission—the proximal reason for the hospitalization. A main diagnosis is an ICD-10 code reported in the main diagnosis field of the index hospitalization—the discharge diagnosis for the hospitalization.17
The primary outcome was readmission within 30 days of the discharge from the index hospitalization.
Our database does not include information on vital status except when individuals received a discharge code indicative of death. But because it includes eligibility dates, a patient who died at home could nevertheless be censored for follow-up of a preceding index hospitalization. A patient could also be censored if the number of days of eligibility following an index discharge was less than 30. We counted as events all readmissions that occurred within 30 days of discharge from an index admission, but we excluded as index admissions all admissions with postdischarge follow-up less than 30 days where no readmission occurred.
We predicted 30-day readmission using a logistic generalized linear mixed model (GLMM) including patient and provider random effects. This model accounts for potential correlation of outcomes within patients or providers; failure to do so could invalidate CIs and statistical tests.18
Handling of Count and Continuous Predictors
Table 1 includes 2 count variables (number of claims and number of hospitalizations in the 12 months preceding the index hospitalization) that we treated as continuous predictors. We modeled the 2 continuous variables (age and length of stay [LOS] for the index hospitalization) with linear splines to account for potential nonlinearity of effects on readmission risk.19 We examined interactions of the spline terms with other baseline predictors.
We identified the best set of predictors in our data using least absolute shrinkage and selection operator (LASSO) variable selection in the R package glmnet.20 The LASSO involves estimating a range of models—often hundreds or thousands—and penalizing those that include larger numbers of predictors. In a data set the size of ours, it is practical to apply the LASSO with standard logistic regression but not with the more computationally demanding logistic GLMM. Therefore, in the model selection step we used logistic regression without patient and provider random effects. Then, having identified the best set of predictors, we reestimated their coefficients including the subject and provider random effects.
We evaluated the selected model’s predictive accuracy by computing the area under the receiver operating characteristic curve (AUROC; also known as the C statistic). To avoid overfitting bias, we reran the predictor selection in a 5-fold cross-validation in which we computed the AUROC on each hold-out sample.21 We averaged these AUROCs to obtain an unbiased, efficient estimate of predictive accuracy.
We assessed significance of predictors in the final GLMM using Wald tests and 95% CIs for regression coefficients. We evaluated calibration by plotting observed vs predicted probability of 30-day readmission.
Most variables had few missing values. Among demographic factors, age and sex were complete for all patients, but marital status was available only in Nevada, and race/ethnicity was available only in Florida, Colorado, and Nevada. We therefore excluded marital status and race/ethnicity from our predictive models. We also omitted a small fraction of index admissions for which discharges were recorded with codes “reserved for national assignment” and a small fraction (< 0.05%) of index admissions that lacked a National Provider Identifier.
We assessed robustness by rerunning the model under the following range of assumptions:
We executed all computations in Spark (Apache Software Foundation) and R (R Foundation for Statistical Computing).
eAppendix Figure 1 presents a schematic of the data extraction. The final study cohort of 69,640 patients accounted for 129,170 hospitalizations, of which 29,410 (22.8%) were 30-day readmissions. The median age was 48 years at the time of the earliest claim, and approximately 60% of patients were female (Table 2).
Among diagnosis variables associated with the index admission, the LASSO eliminated the number of claims in the previous 12 months, the diabetes type, and the main diagnosis, retaining the admission diagnosis and several prior diagnosis indicators. Table 3 and eAppendix Table 2 present the coefficients in the selected prediction model. A plot of observed vs expected deciles of 30-day readmission probabilities (eAppendix Figure 2) demonstrates that the model predictions are well calibrated. The Figure presents the ROC curve for the final model, with an AUROC of 0.761. eAppendix Figures 3 and 4 show the estimated readmission rate as functions of patient age and LOS of the index admission, respectively. The plot for age shows that the rate of readmission increases until age 20 years for male patients and age 65 years for female patients, gradually declining thereafter. The plot for LOS shows higher readmission rates for very short stays and stays of 7 to 15 days for both male and female patients.
eAppendix Table 3 shows the predicted probability of readmission for hypothetical patients exhibiting a range of risk factors. The predicted risk ranges from 6% for a hypothetical young female patient with no previous admissions, discharged to home, to 85% for a hypothetical older male patient who has many previous admissions.
We estimated logistic regression models (excluding random effects) containing all 64 combinations of the 6 variable sets—state, LOS, previous admissions, admission type, admission group, and discharge code—and computed the AUROC for each. AUROC values ranged from 0.700 to 0.759, with 32 models giving AUROC within 2% of the maximum achieved using all the variables.
We reran our model excluding the provider random effect, obtaining an AUROC of 0.760. We reran our model excluding 699 uncomplicated labor and delivery admissions (ICD-10 codes O81 and O82; 0.5% of the index admissions), obtaining an estimated AUROC of 0.758.
We reestimated the model using the cohort identified by a more specific diabetes case definition, obtaining an AUROC of 0.750. The method selected all the predictors in Table 3 plus a smaller set of past diagnoses that largely overlapped with the list in eAppendix Table 2.
A small fraction (1.1%) of index hospitalizations had stays longer than 28 days that we were concerned might unduly influence predictions. Rerunning the model excluding these data gave an AUROC of 0.766, suggesting that prediction is poor for longer index stays.
We reran the model excluding 21,752 patients (31.2%) whose 32,826 hospitalization claims (25.4%) had noncontinuous enrollment for a year before or 30 days after, obtaining an estimated AUROC of 0.748.
Thirty-day readmission is a widely acknowledged driver of health care costs. Thus, CMS financially penalizes hospitals that have excessive numbers of 30-day readmissions,22 and there is considerable interest in identifying patients who are at risk for such readmissions and implementing protocols that help to avoid this outcome. Because Medicaid insurance is a significant predictor of 30-day readmission, insurers that offer such plans have a particular stake in reducing these events. Yet models for predicting readmission in diabetes have largely relied on granular demographic and clinical data that may not be available to Medicaid payers.
To address this need, we have used Medicaid claims data to develop and validate a model predicting 30-day hospital readmissions in patients with diabetes. Unlike existing diabetes readmission models that rely on electronic health record fields and include a mix of insurance types, we built our model using only claims data from a Medicaid population. Our model performed at least as well as others that use more granular clinical and pharmacy information. Our variable selection routine retained demographic factors (patient sex and age), clinical history (previous claims and comorbidities), admission information (admission type and diagnosis), and discharge information (discharge destination and LOS). These variables are similar to demographic, comorbidity, and inpatient stay factors that others have found to be predictive of readmission,6 but they represent only information that is available in a Medicaid bill. The model may therefore be useful at the payer and health system levels for identifying patients at elevated risk for readmission and targeting postdischarge management interventions to reduce readmissions and contain costs.
Among Medicaid patients with diabetes hospitalized in 7 states in 2016-2019, 23% of admissions resulted in readmission within 30 days of discharge, comparable with overall 30-day readmission rates observed in studies including mixed insurance coverage.23 We observed wide variation in readmission rates across the 7 states in our analysis (14.2%-31.6%) (Table 2), a pattern like that observed with overall 30-day Medicaid readmissions across states.14 This variation likely reflects variability in eligibility, coverage, management, and readmission reduction efforts at the state level. As in other studies examining readmissions in patients with diabetes, we found age, number of prior admissions, LOS, postdischarge care environment, and diagnosis of depression or other mental health conditions to be independent predictors of readmission.8,9,23 Whereas others have described a positive association between increasing LOS and readmission risk in patients with diabetes,5,8,9,23 our analysis using a spline regression model suggests that the relationship between LOS and readmission risk is nonmonotone, with an early peak in readmissions at 1 to 2 days and a later peak at 7 to 15 days. A similar nonlinear relationship between LOS and readmission risk exists in heart failure readmissions.24 As in other studies,8,9,23 male sex predicts a higher readmission rate. We also observed statistically significant interactions of sex with age and sex with LOS.
The predictive accuracy of our claims-only 30-day readmission risk model for Medicaid patients with diabetes is comparable to those noted in recent reviews.4,7,8 Our model has better accuracy (AUROC = 0.76) than the Diabetes Early Readmission Risk Indicator (DERRI; AUROC = 0.69), a widely studied diabetes readmission model. Unlike DERRI, which requires detailed sociodemographic (employment status, distance to hospital) and clinical (macrovascular diabetes complications, laboratory values, and medications) data, our model uses only data from Medicaid enrollment and billing files. Our model performs moderately worse than a Medicare claims-based 30-day readmission model for patients with diabetes (AUROC = 0.82) that also incorporated Medicare pharmacy data.9 Our model does include comorbidities available in claims files including mental health conditions, which are strong predictors of 30-day readmissions in all patients with Medicaid coverage14 and those with diabetes23 (eAppendix Table 2).
A novel aspect of our model is the use of splines to describe effects of age, LOS, and their interactions with sex, enabling flexible modeling of nonmonotone trends. Including state as a predictor accounts for the heterogeneity of Medicaid coverage, programs, and readmission initiatives. Another novel element is the use of a GLMM to account for correlation within both patients and providers in estimating the prediction model.9
Our model is the first to focus exclusively on diabetes readmissions in the Medicaid population.13,25 Payer type is an important predictor in readmission models in the United States that reflects the underlying patient populations covered. Medicaid payer status is an independent risk factor for readmission, as covered patients are poorer and have higher rates of chronic conditions and mental illness, as well as socioeconomic disadvantages that are difficult to capture.13,26 We found the postdischarge environment to be a significant predictor of readmission, possibly reflecting the level of social support available and offering an opportunity to tailor readmission reduction interventions.
Strengths and Limitations
Strengths of our study include a focus on Medicaid patients with diabetes—a large but understudied population that is at high risk for readmission. The pooling of managed care and fee-for-service Medicaid claims data from 7 states enhances the generalizability of our model and provides a larger sample size than previous studies of diabetes readmissions.4 Our study is, however, not without limitations. First, because our claims data are deidentified, precluding linkage to the National Death Index, they likely excluded some deaths. Second, our claims data do not contain patient race, which is known to be associated with readmission rate.13 Finally, our data set does not contain information on whether the claim originated from Medicaid managed care or fee for service. We know, however, that in 4 states (Georgia, Indiana, Kentucky, Ohio) all claims are from managed care plans; in 1 state (Florida), all claims are fee for service; and the other 2 states (Colorado, Nevada) mix claims from both types of plan. Although Florida (fee for service only) had the highest adjusted readmission rate, Colorado (mixed) had the lowest; thus, there is no clear pattern suggesting that type of plan is associated with the readmission outcome.
In this study, we derived and validated a claims-only statistical model to predict 30-day readmissions for hospitalized Medicaid patients with diabetes. Our model is of similar accuracy to others that use more detailed clinical and demographic variables. It may be of use to health plans, policy makers, and health systems as they seek to risk stratify populations and refine, develop, and target interventions to help contain readmission-related costs in Medicaid programs. Future work, including external validation of the risk model within Medicaid programs at the state or payer level, can facilitate targeting and deployment of readmission reduction initiatives to reduce morbidity and health care costs.
The authors are grateful for the help and guidance of Gary Call, Briget Da Graca, Kelly Dickson, Amir Marashi, Donna Price, and Kathy Tannous.
Author Affiliations: Department of Statistics & Data Science (JY, GF, DFH) and Cox School of Business (VA), Southern Methodist University, Dallas, TX; Robbins Institute for Health Policy and Leadership, Hankamer School of Business, Baylor University (GF), Waco, TX; Department of Internal Medicine (MEB), Department of Pediatrics (MEB), and Peter O’Donnell Jr. School of Public Health (MEB, DFH), University of Texas Southwestern Medical Center, Dallas, TX.
Source of Funding: HMS Inc, Digital Health CRC Ltd, and the University of Western Sydney provided funding for the research through a contract with Southern Methodist University. Sponsors provided the Medicaid data and a stipend for the graduate research assistant (Dr Yun). Funders reviewed results and the final draft of the manuscript but otherwise had no role in the research. Dr Bowen received support from National Institutes of Health/National Institute of Diabetes and Digestive and Kidney Diseases Career Development Award DK104065.
Author Disclosures: Dr Bowen reports receipt of a research grant from Boehringer Ingelheim. The remaining authors report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.
Authorship Information: Concept and design (GF, VA, DFH); analysis and interpretation of data (JY, GF, VA, MEB, DFH); drafting of the manuscript (JY, GF, VA, MEB, DFH); critical revision of the manuscript for important intellectual content (JY, GF, VA, MEB, DFH); statistical analysis (JY, VA, DFH); obtaining funding (DFH); and administrative, technical, or logistic support (JY).
Address Correspondence to: Daniel F. Heitjan, PhD, Department of Statistics & Data Science, Southern Methodist University, 144 Heroy Hall, 3225 Daniel St, Dallas, TX 75275-0332. Email: firstname.lastname@example.org.
1. National diabetes statistics report, 2020: estimates of diabetes and its burden in the United States. CDC. 2020. Accessed April 3, 2021. https://www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-statistics-report.pdf
2. Schneider ALC, Kalyani RR, Golden S, et al. Diabetes and prediabetes and risk of hospitalization: the Atherosclerosis Risk in Communities (ARIC) study. Diabetes Care. 2016;39(5):772-779. doi:10.2337/dc15-1335
3. Muggeo M, Verlato G, Bonora E, et al. The Verona diabetes study: a population-based survey on known diabetes mellitus prevalence and 5-year all-cause mortality. Diabetologia. 1995;38(3):318-325. doi:10.1007/BF00400637
4. Rubin DJ. Correction to: hospital readmission of patients with diabetes. Curr Diab Rep. 2018;18(4):21. doi:10.1007/s11892-018-0989-1
5. Chopra I, Wilkins TL, Sambamoorthi U. Hospital length of stay and all-cause 30-day readmissions among high-risk Medicaid beneficiaries. J Hosp Med. 2016;11(4):283-288. doi:10.1002/jhm.2526
6. Robbins TD, Lim Choi Keung SN, Sankar S, Randeva H, Arvanitis TN. Risk factors for readmission of inpatients with diabetes: a systematic review. J Diabetes Complications. 2019;33(5):398-405. doi:10.1016/j.jdiacomp.2019.01.004
7. Rubin DJ, Handorf EA, Golden SH, Nelson DB, McDonnell ME, Zhao H. Development and validation of a novel tool to predict hospital readmission risk among patients with diabetes. Endocr Pract. 2016;22(10):1204-1215. doi:10.4158/E161391.OR
8. Rubin DJ, Shah AA. Predicting and preventing acute care re-utilization by patients with diabetes. Curr Diab Rep. 2021;21(9):34. doi:10.1007/s11892-021-01402-7
9. Collins J, Abbass IM, Harvey R, et al. Predictors of all-cause 30-day readmission among Medicare patients with type 2 diabetes. Curr Med Res Opin. 2017;33(8):1517-1523. doi:10.1080/03007995.2017.1330258
10. 2020 Medicaid and CHIP beneficiary profile: characteristics, health status, access, utilization, expenditures, and experience. CMS. August 2021. Accessed April 23, 2021. https://www.medicaid.gov/medicaid/quality-of-care/downloads/beneficiary-profile-2021.pdf
11. Medicaid eligibility. Medicaid.gov. Accessed December 23, 2021. https://www.Medicaid.gov/Medicaid/eligibility/index.html
12. June 2021 Medicaid & CHIP enrollment data highlights. Medicaid.gov. Accessed December 23, 2021. https://web.archive.org/web/20211230120736/https:/www.Medicaid.gov/Medicaid/program-information/Medicaid-and-chip-enrollment-data/report-highlights/index.html
13. Regenstein M, Andres E. Reducing hospital readmissions among Medicaid patients: a review of the literature. Qual Manag Health Care. 2014;23(4):203-225. doi:10.1097/QMH.0000000000000043
14. Trudnak T, Kelley D, Zerzan J, Griffith K, Jiang HJ, Fairbrother GL. Medicaid admissions and readmissions: understanding the prevalence, payment, and most common diagnoses. Health Aff (Millwood). 2014;33(8):1337-1344. doi:10.1377/hlthaff.2013.0632
15. Billings J, Dixon J, Mijanovich T, Wennberg D. Case finding for patients at risk of readmission to hospital: development of algorithm to identify high risk patients. BMJ. 2006;333(8):327-330. doi:10.1136/bmj.38870.657917.AE
16. Hebert PL, Geiss LS, Tierney EF, Engelgau MM, Yawn BP, McBean AM. Identifying persons with diabetes using Medicare claims data. Am J Med Qual. 1999;14(6):270-277. doi:10.1177/106286069901400607
17. CMS technical instructions: diagnosis, procedure codes. Medicaid.gov. Accessed August 1, 2021. https://www.Medicaid.gov/Medicaid/data-and-systems/macbis/tmsis/tmsis-blog/entry/51965
18. Stroup WW. Generalized Linear Mixed Models: Modern Concepts, Methods, and Applications. CRC Press; 2012.
19. Durrleman S, Simon R. Flexible regression models with cubic splines. Stat Med. 1989;8(5):551-561. doi:10.1002/sim.4780080504
20. Hastie T, Qian J, Tay K. An introduction to glmnet. glmnet. Updated March 27, 2023. Accessed June 31, 2021. http://glmnet.stanford.edu/articles/glmnet.html
21. Harrell FE Jr, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modeling strategies for improved prognostic prediction. Stat Med. 1984;3(2):143-152. doi:10.1002/sim.4780030207
22. Hospital Readmissions Reduction Program (HRRP). CMS. Updated February 23, 2023. Accessed November 6, 2022. https://www.cms.gov/Medicare/Medicare-fee-for-service-payment/acuteinpatientPPS/readmissions-reduction-program
23. Soh JGS, Wong WP, Mukhopadhyay A, Quek SC, Tai BC. Predictors of 30-day unplanned hospital readmission among adult patients with diabetes mellitus: a systematic review with meta-analysis. BMJ Open Diabetes Res Care. 2020;8(1):e001227. doi:10.1136/bmjdrc-2020-001227
24. Sud M, Yu B, Wijeysundera HC, et al. Associations between short or long length of stay and 30-day readmission and mortality in hospitalized patients with heart failure. JACC Heart Fail. 2017;5(8):578-588. doi:10.1016/j.jchf.2017.03.012
25. Jiang HJ, Stryer D, Friedman B, Andrews R. Multiple hospitalizations for patients with diabetes. Diabetes Care. 2003;26(5):1421-1426. doi:10.2337/diacare.26.5.1421
26. Ku L, Ferguson C. Medicaid works: a review of how public insurance protects the health and finances of children and other vulnerable populations. Issue Lab. June 21, 2011. Accessed December 11, 2021. https://search.issuelab.org/resource/medicaid-works-a-review-of-how-public-insurance-protects-the-health-and-finances-of-children-and-other-vulnerable-populations.html