Currently Viewing:
The American Journal of Managed Care January 2020
Using Applied Machine Learning to Predict Healthcare Utilization Based on Socioeconomic Determinants of Care
Soy Chen, MS; Danielle Bergman, BSN, RN; Kelly Miller, DNP, MPH, APRN, FNP-BC; Allison Kavanagh, MS; John Frownfelter, MD, MSIS; and John Showalter, MD
Eliminating Barriers to Virtual Care: Implementing Portable Medical Licensure
Pooja Chandrashekar, AB; and Sachin H. Jain, MD, MBA
Trust in Provider Care Teams and Health Information Technology–Mediated Communication
Minakshi Raj, MPH; Jodyn E. Platt, PhD, MPH; and Adam S. Wilk, PhD
The Health IT Special Issue: Enduring Barriers to Adoption and Innovative Predictive Methods
Ilana Graetz, PhD
What Accounts for the High Cost of Care? It’s the People: A Q&A With Eric Topol, MD
Interview by Allison Inserro
Does Machine Learning Improve Prediction of VA Primary Care Reliance?
Edwin S. Wong, PhD; Linnaea Schuttner, MD, MS; and Ashok Reddy, MD, MSc
Health Information Technology for Ambulatory Care in Health Systems
Yunfeng Shi, PhD; Alejandro Amill-Rosario, MPH; Robert S. Rudin, PhD; Shira H. Fischer, MD, PhD; Paul Shekelle, MD; Dennis Scanlon, PhD; and Cheryl L. Damberg, PhD
The Challenges of Consumerism for Primary Care Physicians
Timothy Hoff, PhD
Advancing the Learning Health System by Incorporating Social Determinants
Deepak Palakshappa, MD, MSHP; David P. Miller Jr, MD, MS; and Gary E. Rosenthal, MD
Currently Reading
Predicting Hospitalizations From Electronic Health Record Data
Kyle Morawski, MD, MPH; Yoni Dvorkis, MPH; and Craig B. Monsen, MD, MS
e-Consult Implementation Success: Lessons From 5 County-Based Delivery Systems
Margae Knox, MPH; Elizabeth J. Murphy, MD, DPhil; Timi Leslie, BS; Rachel Wick, MPH; and Delphine S. Tuot, MDCM, MAS

Predicting Hospitalizations From Electronic Health Record Data

Kyle Morawski, MD, MPH; Yoni Dvorkis, MPH; and Craig B. Monsen, MD, MS
The authors aimed to develop a rigorous technique for predicting hospitalizations using data that are already available to most health systems.
Other Statistical Tests

For continuous variables, we report means and SDs. For noncontinuous variables, we report counts and percentages. For normally distributed data, we applied the t test. For nonnormally distributed data, we applied the Wilcoxon test. For comparisons between categorical variables, we used the Fisher test.

Sensitivity Testing

Although the canonical model included EHR and claims data, we sought to identify which category of variables most contributed to model performance. We trained 15 models testing 2 dimensions of model characteristics.

The first dimension compared models developed from different data sources: EHR data only, claims data only, or both. The EHR data–only models used information drawn from the EHR (eg, medication use categories were ascertained as positive if the patient had a medication order placed by a provider). The claims data–only models used information drawn from claims (eg, medication use categories were ascertained as positive if a patient had a medication dispense claim in the administrative data). In the models using both data sources, a categorical feature was ascertained to be positive if there was evidence from either the EHR or claims data.

The second dimension considered was variable types. Separate models were trained to include demographic variables only, diagnoses only, medications only, prior utilization only, or all variables combined. Model performance was assessed for training and testing sets using the C statistic.


Study Population

After exclusions, 363,855 patient-months were included for analysis, corresponding to 185,388 unique patients. Selected patient characteristics ascertained by combining EHR and claims data are summarized in Table 1. In aggregate, 5% of the study population had been hospitalized within 6 months of an index date.

Model Features

After excluding variables with low counts or protective factors, 169 variables were included in the final model. Diagnoses, demographics, and prior utilization were well represented among the top predictors (Figure 2). The features with the highest ORs for predicting future hospitalization were sickle cell anemia (OR, 52.72), lipidoses and glycogenosis (OR, 8.44), heart transplant (OR, 6.12), and age 76 years or older (OR, 5.32). A full list of final features is included in the eAppendix.

Model Performance
Model discrimination. Model discrimination varied widely, depending primarily on included variables. The predictive model using only prescription medications performed least well, with an AUC of 0.602. The model including all variable types, claims data, and EHR data performed best on the testing set, with an AUC of 0.846. There were no statistical differences in performance on the testing set among the 3 models including all variable types based on claims data alone (AUC, 0.840; 95% CI, 0.832-0.848), EHR data alone (AUC, 0.840; 95% CI, 0.831-0.848), or the claims and EHR data combined (AUC, 0.846; 95% CI, 0.838-0.853). Table 2 illustrates these results in more detail.

Model calibration. The best-performing model, which included all variable types from claims and EHR data combined, appeared to be well calibrated (Figure 3). Predicted probability of hospitalization at 6 months corresponded closely to the observed proportion of hospitalized patients when sorted into 10 bins of equal size (~7300 patients per bin). Further, the slope of the calibration was 0.96 (95% CI, 0.94-0.98) compared with a perfectly calibrated slope of 1.0. The model overestimated 6-month hospitalizations among those with the highest predicted risk.

Copyright AJMC 2006-2020 Clinical Care Targeted Communications Group, LLC. All Rights Reserved.
Welcome the the new and improved, the premier managed market network. Tell us about yourself so that we can serve you better.
Sign Up