Currently Viewing:
The American Journal of Managed Care January 2020
Using Applied Machine Learning to Predict Healthcare Utilization Based on Socioeconomic Determinants of Care
Soy Chen, MS; Danielle Bergman, BSN, RN; Kelly Miller, DNP, MPH, APRN, FNP-BC; Allison Kavanagh, MS; John Frownfelter, MD, MSIS; and John Showalter, MD
Eliminating Barriers to Virtual Care: Implementing Portable Medical Licensure
Pooja Chandrashekar, AB; and Sachin H. Jain, MD, MBA
Trust in Provider Care Teams and Health Information Technology–Mediated Communication
Minakshi Raj, MPH; Jodyn E. Platt, PhD, MPH; and Adam S. Wilk, PhD
The Health IT Special Issue: Enduring Barriers to Adoption and Innovative Predictive Methods
Ilana Graetz, PhD
What Accounts for the High Cost of Care? It’s the People: A Q&A With Eric Topol, MD
Interview by Allison Inserro
Does Machine Learning Improve Prediction of VA Primary Care Reliance?
Edwin S. Wong, PhD; Linnaea Schuttner, MD, MS; and Ashok Reddy, MD, MSc
Health Information Technology for Ambulatory Care in Health Systems
Yunfeng Shi, PhD; Alejandro Amill-Rosario, MPH; Robert S. Rudin, PhD; Shira H. Fischer, MD, PhD; Paul Shekelle, MD; Dennis Scanlon, PhD; and Cheryl L. Damberg, PhD
The Challenges of Consumerism for Primary Care Physicians
Timothy Hoff, PhD
Advancing the Learning Health System by Incorporating Social Determinants
Deepak Palakshappa, MD, MSHP; David P. Miller Jr, MD, MS; and Gary E. Rosenthal, MD
Currently Reading
Predicting Hospitalizations From Electronic Health Record Data
Kyle Morawski, MD, MPH; Yoni Dvorkis, MPH; and Craig B. Monsen, MD, MS
e-Consult Implementation Success: Lessons From 5 County-Based Delivery Systems
Margae Knox, MPH; Elizabeth J. Murphy, MD, DPhil; Timi Leslie, BS; Rachel Wick, MPH; and Delphine S. Tuot, MDCM, MAS

Predicting Hospitalizations From Electronic Health Record Data

Kyle Morawski, MD, MPH; Yoni Dvorkis, MPH; and Craig B. Monsen, MD, MS
The authors aimed to develop a rigorous technique for predicting hospitalizations using data that are already available to most health systems.

Principal Findings

Using a combination of EHR and claims data describing patients’ demographics, healthcare utilization behavior, medical diagnoses, and medications, we were able to develop a risk score that accurately predicted hospitalization in the ensuing 6 months. Although our results suggest some utility to combining EHR and claims data to inform predictive model creation, we find that even in scenarios in which only EHR or claims data are available, strong performance can be achieved provided that a diverse collection of variable types is represented. A variety of highly predictive characteristics were derived from all major domains evaluated. Consistent with traditional methods, age group was one of the strongest predictors, with the more elderly groups being at higher risk. Prior healthcare utilization was also a strong predictor and likely covaries with many other factors in the model. However, this collinearity improves the variance of the logistic regression approach and may allow unmeasured factors, such as healthcare literacy and choices among individuals of where to seek care, influence in the prediction.18 Particular medical diagnoses also were found to be predictive, likely indicating frailty and rapid decline in health status that is unable to be adequately managed in the outpatient setting. For example, those with end-stage organ damage (renal or hepatic) have little functional reserve, necessitating precision with both health behaviors and medication adjustments. They are prone to imbalances in fluid or electrolytes that require the care of the inpatient setting for monitoring and correction.

The risk prediction score was also found to be well calibrated in those less likely to be hospitalized in the next 6 months, but it did become less accurate among those at higher risk of hospitalization. The model tended to overestimate the likelihood of hospitalization in those with higher than 30% predicted risk, likely owing to the small number of patients demonstrating such high risk.

Comparison With Prior Work

Although many risk scores have been created for individual disease entities19 or certain groups of people,20-24 ours is agnostic of clinical condition or demographic. Past efforts in predicting hospitalization have been limited in addressable ways.25,26 Whereas other models are updated infrequently, as in the case of the QAdmissions model from the British National Health Service that is updated quarterly,27 the present model may be updated weekly to provide more timely information across a range of clinical applications. Another model uses a clinician’s assessment to ask whether a patient is likely to be seen in the emergency ward,25 whereas ours uses a multimodal, data-derived approach to create the risk prediction. Additionally, our model’s C statistic of 0.846 compares favorably with those of previous models (0.67-0.77), which we attribute to its incorporation of a wide array of variables (demographics, clinical diagnoses, medications, and prior utilization). We believe that our model adds to the current literature by providing an example of EHR and claims data utilization that can routinely and in real time provide risk prediction for hospitalization among patients seen in a primary care setting.


Our investigation has limitations. First, the retrospective analysis was performed using data from a single health system without an external center to validate our results. Although this threatens the generalizability of the model results, we believe the approach is one that can be reproduced at other centers to derive a more tailored model that reflects local patients, patient features, and care practices, all of which may also influence the risk of hospitalization. For instance, ED visits may occur with different frequencies and in different clinical scenarios in other parts of the country due to geographical characteristics of care providers. Other regions may have differing access to outpatient care, which may result in lower-acuity situations escalating to inpatient care. It is worth noting that we used data representing a large, diverse patient population, which offers some stability to the model coefficients and results. That said, we would expect that a given health system could apply these methods to calibrate the model for its own patients and system of care.

After creating our model, we used an internal validation strategy, testing its predictive ability on 20% of the data that were withheld during model creation. Other methods of validation include bootstrapping28 and external validation.29 We felt that the training/testing set approach was a sufficiently accurate and interpretable method for measuring discrimination, and we observe that it is commonly used in the literature.30 Because these efforts were performed to improve the quality of care in a single health system, future research work would be helpful to validate our approach on an external population.31

The extent to which our predictive model can better target particular interventions and improve care remains to be proven. First, the strongest covariates in the model were those that are nonmodifiable, such as clinical diagnoses. For example, somebody with sickle cell anemia or a heart transplant cannot modify those factors. Second, for factors that are modifiable, such as medication use, the coefficients derived are correlative, not causative. One must be careful not to interpret the fact that a patient is on a medication associated with hospitalization to mean that the medication is a cause of future hospitalization. The net of this is that although identifying highest-risk patients seems a natural approach to prioritize interventions such as postdischarge education and case management, our model provides no evidence that such patients are amenable to these interventions or that their risk of hospitalization would be responsive to them.

Despite these limitations, we believe that our model approach is a meaningful step toward identifying patients at highest risk of hospitalization. Tying the model to care interventions that are likely to modify the risk of hospitalization represents a promising area for future research.


Prediction models using EHR-only, claims-only, and combined data had similar predictive value and demonstrated strong discrimination for which patients will be hospitalized in the ensuing 6 months. The resulting model offers additional benefits of interpretability and timeliness and may be reproduced with local data for greater accuracy.

Author Affiliations: Atrius Health (KM, YD, CBM), Newton, MA.

Source of Funding: Atrius Health institutional funding.

Author Disclosures: The authors report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.

Authorship Information: Concept and design (YD, CBM); acquisition of data (YD); analysis and interpretation of data (YD, CBM); drafting of the manuscript (KM, YD, CBM); critical revision of the manuscript for important intellectual content (KM, YD, CBM); statistical analysis (YD, CBM); administrative, technical, or logistic support (CBM); and supervision (KM, CBM).

Address Correspondence to: Kyle Morawski, MD, MPH, Atrius Health, 133 Brookline Ave, Boston, MA 02215. Email:

1. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2:3. doi: 10.1186/2047-2501-2-3.

2. Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1:18. doi: 10.1038/s41746-018-0029-1.

3. Parikh RB, Kakad M, Bates DW. Integrating predictive analytics into high-value care: the dawn of precision delivery. JAMA. 2016;315(7):651-652. doi: 10.1001/jama.2015.19417.

4. Roski J, Bo-Linn GW, Andrews TA. Creating value in health care through big data: opportunities and policy implications. Health Aff (Millwood). 2014;33(7):1115-1122. doi: 10.1377/hlthaff.2014.0147.

5. LaRiviere J, McAfee P, Rao J, Narayanan VK, Sun W. Where predictive analytics is having the biggest impact. Harvard Business Review website. Published May 25, 2016. Accessed September 11, 2017.

6. Siegel E. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken, NJ: John Wiley & Sons, Inc; 2013.

7. Hileman G, Steele S. Accuracy of claims-based risk scoring models. Society of Actuaries website. Published October 2016. Accessed February 5, 2019.

8. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc. 2017;24(1):198-208. doi: 10.1093/jamia/ocw042.

9. Monsen KA, Swanberg HL, Oancea SC, Westra BL. Exploring the value of clinical data standards to predict hospitalization of home care patients. Appl Clin Inform. 2012;3(4):419-436. doi: 10.4338/ACI-2012-05-RA-0016.

10. Snooks H, Bailey-Jones K, Burge-Jones D, et al. Predictive risk stratification model: a randomised stepped-wedge trial in primary care (PRISMATIC). Health Serv Deliv Res. 2018;6(1):1-164.

11. Berwick DM, Nolan TW, Whittington J. The Triple Aim: care, health, and cost. Health Aff (Millwood). 2008;27(3):759-769. doi: 10.1377/hlthaff.27.3.759.

12. Reynolds MR, Morais E, Zimetbaum P. Impact of hospitalization on health-related quality of life in atrial fibrillation patients in Canada and the United States: results from an observational registry. Am Heart J. 2010;160(4):752-758. doi: 10.1016/j.ahj.2010.06.034.

13. Krumholz HM. Post-hospital syndrome—an acquired, transient condition of generalized risk. N Engl J Med. 2013;368(2):100-102. doi: 10.1056/NEJMp1212324.

14. Kautter J, Pope GC, Ingber M, et al. The HHS-HCC risk adjustment model for individual and small group markets under the Affordable Care Act. Medicare Medicaid Res Rev. 2014;4(3). doi: 10.5600/mmrr2014-004-03-a03.

15. Krakower DS, Gruber S, Hsu K, et al. Development and validation of an automated HIV prediction algorithm to identify candidates for pre-exposure prophylaxis: a modelling study. Lancet HIV. 2019;6(10):e696-e704. doi: 10.1016/S2352-3018(19)30139-0.

16. Sahni N, Simon G, Arora R. Development and validation of machine learning models for prediction of 1-year mortality utilizing electronic medical record data available at the end of hospitalization in multicondition patients: a proof-of-concept study. J Gen Intern Med. 2018;33(6):921-928. doi: 10.1007/s11606-018-4316-y.

17. Gareth J, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning: With Applications in R. 8th ed. New York, NY: Springer; 2017.

18. Rasu RS, Bawa WA, Suminski R, Snella K, Warady B. Health literacy impact on national healthcare utilization and expenditure. Int J Health Policy Manag. 2015;4(11):747-755. doi: 10.15171/ijhpm.2015.151.

19. Álvarez-García J, Ferrero-Gregori A, Puig T, et al; Investigators of the Spanish Heart Failure Network (REDINSCOR). A simple validated method for predicting the risk of hospitalization for worsening of heart failure in ambulatory patients: the Redin-SCORE. Eur J Heart Fail. 2015;17(8):818-827. doi: 10.1002/ejhf.287.

20. Inouye SK, Zhang Y, Jones RN, et al. Risk factors for hospitalization among community-dwelling primary care older patients: development and validation of a predictive model. Med Care. 2008;46(7):726-731. doi: 10.1097/MLR.0b013e3181649426.

21. Tabak YP, Sun X, Nunez CM, Gupta V, Johannes RS. Predicting readmission at early hospitalization using electronic clinical data: an early readmission risk score. Med Care. 2017;55(3):267-275. doi: 10.1097/MLR.0000000000000654.

22. Morris JN, Howard EP, Steel K, et al. Predicting risk of hospital and emergency department use for home care elderly persons through a secondary analysis of cross-national data. BMC Health Serv Res. 2014;14:519. doi: 10.1186/s12913-014-0519-z.

23. Coleman EA, Wagner EH, Grothaus LC, Hecht J, Savarino J, Buchner DM. Predicting hospitalization and functional decline in older health plan enrollees: are administrative data as accurate as self-report? J Am Geriatr Soc. 1998;46(4):419-425. doi: 10.1111/j.1532-5415.1998.tb02460.x.

24. Kansagara D, Englander H, Salanitro A, et al. Risk prediction models for hospital readmission: a systematic review. JAMA. 2011;306(15):1688-1698. doi: 10.1001/jama.2011.1515.

25. Hwang AS, Ashburner JM, Hong CS, He W, Atlas SJ. Can primary care physicians accurately predict the likelihood of hospitalization in their patients? Am J Manag Care. 2017;23(4):e127-e128.

26. Haas LR, Takahashi PY, Shah ND, et al. Risk-stratification methods for identifying patients for care coordination. Am J Manag Care. 2013;19(9):725-732.

27. Hippisley-Cox J, Coupland C. Predicting risk of emergency admission to hospital using primary care data: derivation and validation of QAdmissions score. BMJ Open. 2013;3(8):e003482. doi: 10.1136/bmjopen-2013-003482.

28. O’Mahony C, Jichi F, Pavlou M, et al; Hypertrophic Cardiomyopathy Outcomes Investigators. A novel clinical risk prediction model for sudden cardiac death in hypertrophic cardiomyopathy (HCM risk-SCD). Eur Heart J. 2014;35(30):2010-2020. doi: 10.1093/eurheartj/eht439.

29. Markaki M, Tsamardinos I, Langhammer A, Lagani V, Hveem K, Røe OD. A validated clinical risk prediction model for lung cancer in smokers of all ages and exposure types: a HUNT study. EBioMedicine. 2018;31:36-46. doi: 10.1016/j.ebiom.2018.03.027.

30. Steyerberg EW, Harrell FE Jr, Borsboom GJJM, Eijkemans MJC, Vergouwe Y, Habbema JDF. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54(8):774-781. doi: 10.1016/s0895-4356(01)00341-9.

31. Steyerberg EW, Harrell FE Jr. Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol. 2016;69:245-247. doi: 10.1016/j.jclinepi.2015.04.005.
Copyright AJMC 2006-2020 Clinical Care Targeted Communications Group, LLC. All Rights Reserved.
Welcome the the new and improved, the premier managed market network. Tell us about yourself so that we can serve you better.
Sign Up