Currently Viewing:
The American Journal of Managed Care January 2020
Using Applied Machine Learning to Predict Healthcare Utilization Based on Socioeconomic Determinants of Care
Soy Chen, MS; Danielle Bergman, BSN, RN; Kelly Miller, DNP, MPH, APRN, FNP-BC; Allison Kavanagh, MS; John Frownfelter, MD, MSIS; and John Showalter, MD
Eliminating Barriers to Virtual Care: Implementing Portable Medical Licensure
Pooja Chandrashekar, AB; and Sachin H. Jain, MD, MBA
Trust in Provider Care Teams and Health Information Technology–Mediated Communication
Minakshi Raj, MPH; Jodyn E. Platt, PhD, MPH; and Adam S. Wilk, PhD
The Health IT Special Issue: Enduring Barriers to Adoption and Innovative Predictive Methods
Ilana Graetz, PhD
What Accounts for the High Cost of Care? It’s the People: A Q&A With Eric Topol, MD
Interview by Allison Inserro
Does Machine Learning Improve Prediction of VA Primary Care Reliance?
Edwin S. Wong, PhD; Linnaea Schuttner, MD, MS; and Ashok Reddy, MD, MSc
Health Information Technology for Ambulatory Care in Health Systems
Yunfeng Shi, PhD; Alejandro Amill-Rosario, MPH; Robert S. Rudin, PhD; Shira H. Fischer, MD, PhD; Paul Shekelle, MD; Dennis Scanlon, PhD; and Cheryl L. Damberg, PhD
The Challenges of Consumerism for Primary Care Physicians
Timothy Hoff, PhD
Advancing the Learning Health System by Incorporating Social Determinants
Deepak Palakshappa, MD, MSHP; David P. Miller Jr, MD, MS; and Gary E. Rosenthal, MD
Currently Reading
Predicting Hospitalizations From Electronic Health Record Data
Kyle Morawski, MD, MPH; Yoni Dvorkis, MPH; and Craig B. Monsen, MD, MS
e-Consult Implementation Success: Lessons From 5 County-Based Delivery Systems
Margae Knox, MPH; Elizabeth J. Murphy, MD, DPhil; Timi Leslie, BS; Rachel Wick, MPH; and Delphine S. Tuot, MDCM, MAS

Predicting Hospitalizations From Electronic Health Record Data

Kyle Morawski, MD, MPH; Yoni Dvorkis, MPH; and Craig B. Monsen, MD, MS
The authors aimed to develop a rigorous technique for predicting hospitalizations using data that are already available to most health systems.

Objectives: Electronic health record (EHR) data have become increasingly available and may help inform clinical prediction. However, predicting hospitalizations among a diverse group of patients remains difficult. We sought to use EHR data to create and internally validate a predictive model for clinical use in predicting hospitalizations.

Study Design: Retrospective observational cohort study.

Methods: We analyzed EHR data in patients 18 years or older seen at Atrius Health from June 2013 to November 2015. We selected variables among patient demographics, clinical diagnoses, medications, and prior utilization to train a logistic regression model predicting any hospitalization within 6 months and validated the model using a separate validation set. We performed sensitivity analysis on model performance using combinations of EHR-derived, claims-derived, or both EHR- and claims-derived data.

Results: After exclusions, 363,855 patient-months were included for analysis, representing 185,388 unique patients. The strongest features included sickle cell anemia (odds ratio [OR], 52.72), lipidoses and glycogenosis (OR, 8.44), heart transplant (OR, 6.12), and age 76 years or older (OR, 5.32). Model testing showed that EHR-only data had an area under the receiver operating characteristic curve (AUC) of 0.84 (95% CI, 0.838-0.853), which was similar to the claims-only data (AUC, 0.84; 95% CI, 0.831-0.848) and combined claims and EHR data (AUC, 0.846; 95% CI, 0.838-0.853).

Conclusions: Prediction models using EHR-only, claims-only, and combined data had similar predictive value and demonstrated strong discrimination for which patients will be hospitalized in the ensuing 6 months.

Am J Manag Care. 2020;26(1):e7-e13
Takeaway Points

We aimed to develop a rigorous technique for predicting hospitalizations using data that are already available to most health systems.
  • Our research can be used to provide clinicians with a risk score for a given patient, which can help guide care.
  • Using the predictive tool, clinicians may be able to more accurately triage patients’ concerns and respond to those concerns to prevent worsening of their condition and need for hospitalization.
The healthcare system generates, collects, and stores a tremendous amount of data during the course of a patient’s clinical encounter, with one study finding an average of more than 200,000 individual data points available during a single hospital stay.1,2 These data are used to monitor a patient’s progress, coordinate care among all members of the healthcare team, and provide documentation for billing and reporting activities. Although the use of data for these purposes has been long-standing, the availability of these data has increased substantially. The Health Information Technology for Economic and Clinical Health Act of 2009 was passed in part to assist healthcare professionals’ transition to electronic health records (EHRs). A decade later, systematically collected data generated in the course of clinical care have created an opportunity to use such data to improve care practices.3,4

Retail entities have put forth strategic investments in data science, often with substantial return.5,6 Accordingly, using data stored in EHRs to improve the lives of patients and lower total medical costs is one approach to transforming care. Big data, machine learning, and predictive analytics are some of the ways that clinicians hope to anticipate patients’ needs and improve outcomes, evidenced by the myriad of organizations working in this field.7 However, this is an evolving field with improving techniques, accuracy, and actionability of predictions. We need more precise prediction models and better integration of data into clinical care4,8-10 to focus care resources and, in doing so, provide higher value.11

The morbidity12,13 and healthcare costs13 associated with hospital admissions underscore the need for hospitalization prevention activities including patient outreach, review of recent discharges, and case management. Unfortunately, acute hospital care needs remain difficult to predict.9 A recent review evaluating accuracy of EHR-based prediction modeling showed that hospitalization and service utilization were more difficult to predict than mortality or disease-specific outcomes. Whereas mortality and clinical prediction models demonstrated C statistics ranging above 0.8, the discrimination of models built to predict hospitalization and service utilization was lower, at 0.71.8

Several approaches to improve hospitalization prediction exist, such as using new data sources, new variable types, more complete data, more timely data, or more advanced statistical methods. Data sets capable of linking EHR and claims data at the patient level remain uncommon. We hypothesized that when combined, these 2 data sources would complement each other and lead to stronger prediction than that observed previously. We set out to develop and test a model that uses EHR and claims data to predict patient hospitalizations in such a way that it can be implemented in an outpatient practice setting.


Study Design

We performed a retrospective analysis of data generated in the course of clinical care and healthcare operations to develop a logistic regression model predicting a patient’s future risk of hospitalization. Data were extracted from Atrius Health’s unified data warehouse, which marries clinical data from Atrius Health’s EHR (Epic version 2015; Epic Systems; Verona, Wisconsin) to normalized administrative claims data received from Medicare, Medicaid, and commercial payers. Variables were ascertained at the patient-month level. To reflect seasonality in hospitalization outcomes, 4 dates of prediction—referred to as index dates—were selected throughout the study period: September 1, 2014; December 1, 2014; March 1, 2015; and June 1, 2015. Sensitivity testing was performed to determine how the inclusion of certain variable categories or data sources (ie, EHR vs claims) would influence model performance. The analysis was performed as part of a quality improvement effort at Atrius Health and did not undergo institutional review board review.

Study Population

The study population was selected among patients seen from June 2013 to November 2015 at Atrius Health, a large multispecialty group in eastern Massachusetts. The population included patients insured under Medicare, Medicaid, and commercial contracts. Patients younger than 18 years were excluded from analysis because adult primary care was the focus of this effort.

Outcome Variable

We selected a binary outcome variable indicating if a patient had experienced any medical/surgical admission within 6 months of the index date of prediction. We chose to predict hospitalizations within 6 months to best match the prediction interval with the timeline of likely future downstream interventions. For example, to assist in the care of a complex patient, a relationship with a case manager is often established. This potential intervention requires a period of time to plausibly affect risk of hospitalization. Longer prediction intervals would potentially dilute the impact of future interventions or else necessitate interventions spanning very long time horizons. We excluded obstetrical admissions because these would not be targets for anticipated interventions.

Copyright AJMC 2006-2020 Clinical Care Targeted Communications Group, LLC. All Rights Reserved.
Welcome the the new and improved, the premier managed market network. Tell us about yourself so that we can serve you better.
Sign Up