Artificial intelligence based on medical claims data outperforms traditional models in stratifying patient risk.
ABSTRACTObjectives: Current models for patient risk prediction rely on practitioner expertise and domain knowledge. This study presents a deep learning model—a type of machine learning that does not require human inputs—to analyze complex clinical and financial data for population risk stratification.
Study Design: A comparative predictive analysis of deep learning versus other popular risk prediction modeling strategies using medical claims data from a cohort of 112,641 pediatric accountable care organization members.
Methods: “Skip-Gram,” an unsupervised deep learning approach that uses neural networks for prediction modeling, used data from 2014 and 2015 to predict the risk of hospitalization in 2016. The area under the curve (AUC) of the deep learning model was compared with that of both the Clinical Classifications Software and the commercial DxCG Intelligence predictive risk models, each with and without demographic and utilization features. We then calculated costs for patients in the top 1% and 5% of hospitalization risk identified by each model.
Results: The deep learning model performed the best across 6 predictive models, with an AUC of 75.1%. The top 1% of members selected by the deep learning model had a combined healthcare cost $5 million higher than that of the group identified by the DxCG Intelligence model.
Conclusions: The deep learning model outperforms the traditional risk models in prospective hospitalization prediction. Thus, deep learning may improve the ability of managed care organizations to perform predictive modeling of financial risk, in addition to improving the accuracy of risk stratification for population health management activities.
Am J Manag Care. 2019;25(10):e310-e315Takeaway Points
The present study benchmarked a new deep learning methodology for patient risk stratification using clinical and financial data for a small pediatric accountable care organization (ACO) data set. The predictive validity of the deep learning model was higher than that of other popular population risk prediction modeling strategies that, unlike deep learning, require practitioner expertise and domain knowledge. The deep learning model, although preliminary, may:
The current US healthcare climate is focused on delivery transformation and reform through value-based models that increasingly hold healthcare organizations accountable for population-based outcomes. The accountable care organization (ACO) is a popular type of value-based model that relies on patient risk stratification to identify high-risk patient populations for targeted care coordination and population health management activities.1 Risk stratification of patients using predictive regression models has long been an analytic challenge because of sparse, high-dimensional, and noisy data from insurance claims.2
Currently, the state-of-the-art risk stratification models rely on groupers of diagnosis codes developed in the early 2000s, such as Diagnostic Cost Group (DCG) and its Medicare equivalent DxCG,3 Clinical Classifications Software (CCS),2 and the Johns Hopkins Adjusted Clinical Groups (ACG) model.4 These grouper models are important tools for conducting risk stratification in the field of managed care, including in quality measurement and reporting, risk adjustment, and reimbursement. A 2013 study by Haas and colleagues explored the predictive validity of 6 popular risk stratification modeling techniques involving predictive multivariate modeling of administrative data into categories or weighted scores to predict future healthcare utilization.5 Although the study found that each of the 6 models had at least fair predictive validity, the evaluation was centered on an adult population, as were most published studies of risk adjustment.
As researchers and actuarial experts pointed out, many traditional risk adjustment methodologies were developed using a standard population and hence optimized with greater emphasis on adults, potentially limiting their predictive power in pediatric populations.6,7 In an effort to develop pediatric-focused risk adjustment, researchers have noted that due to the persistence of healthcare costs among certain groups of high-cost children, pediatric risk models benefit from including patient-reported measures of health status.7 Specifically, among Medicaid patients, the predictive validity of pediatric cost prediction models can be improved by adding survey-based measures to models of administrative data7,8; however, this methodology requires the collection of a large volume of survey data from the target patient population.
Recently, deep learning has made the implementation of predictive analytics easier due to an unsupervised learning approach with automatic feature engineering. The deep learning method is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers and connections. This approach allows the computer to learn complex concepts by building them out of simpler ones, identifying patterns and dependencies in the data. Outside the healthcare domain, deep learning and artificial intelligence have transformed our daily lives, with applications in face recognition, credit rating, and instant approval of life insurance.9 Within healthcare, deep learning is in use in a variety of health information technology contexts, including genomic analysis and biomedical image analysis.10 The application of deep learning to patient-level risk prediction is a new area of exploration. Early uses of deep learning in electronic health record data have shown promising results, including the use of recurrent neural networks to predict future diagnoses and medication orders11 and autoencoders to predict patients’ health status.12
Deep learning can handle and leverage the complex relationships in large, sparse, high-dimensional, and noisy data, with no data supervision or labeling needed. Embedding is a popular technique in which all the input concepts (ie, diagnosis and procedure codes) from the data set are mapped to vectors of real numbers in a low-dimensional continuous space where the distance between medical concepts conveys similarity between concepts. A 2016 study by Miotto et al explored the use of deep learning to predict a patient’s future medical conditions.12 The authors’ work suggested that deep learning can replace hands-on algorithm creation.
In the present study, we further develop this deep learning approach through the hypothesis that if the future diagnosis of a patient can be predicted, then deep learning can also predict clinical and financial risks. More specifically, the primary objective of our study is to develop a framework for pediatric patient risk prediction that can be used to improve patient risk stratification. If successful, the model derived from this study could enable healthcare organizations to identify high-risk patients for care management interventions and improve financial risk forecasting. Our specific aims are (1) to use a deep learning model to develop a predictive algorithm for future healthcare utilization from ACO data (including diagnosis, demographic, financial, and provider data) and (2) to assess the predictive validity of this model. To our knowledge, this is the first deep learning study to utilize complex clinical and financial data for population risk stratification. To demonstrate the feasibility of this approach, we benchmarked a deep learning method on a small pediatric data set. We present the results of this preliminary work and discuss the implications of this model for patient health, healthcare practice, and public health policy and management.
Partners for Kids (PFK) is an ACO affiliated with Nationwide Children’s Hospital in Columbus, Ohio. As one of the oldest and largest pediatric ACOs, PFK assumes full financial risk for a pediatric population of more than 330,000 Medicaid-qualified children in central and southeastern Ohio.13 In accordance with the Common Rule (45 CFR 46.102[f]) and the policies of Nationwide Children’s Institutional Review Board, this study used a limited data set and was not considered human subjects research; it was therefore not subject to institutional review board approval.
Deep Learning Modeling
We utilized “Skip-Gram,”14 an unsupervised learning approach that uses neural networks to learn the relationships among medical codes. Although Skip-Gram is only a 3-layer neural network, we refer to it as a “deep learning” algorithm following the work of Minarro-Giménez et al15 and Xiao et al.16 We grouped medical codes (diagnoses, procedures, and drugs) by month for the current year and projected each patient’s medical codes into latent vector space utilizing Skip-Gram to predict patient hospitalization in the subsequent year (Figure). This model can learn the semantic information of medical codes and forms a condensed representation (ie, vector) for each medical code. This allows us to better utilize the medical codes for predicting future hospitalization. For more details on our analytic approach, please see our forthcoming technical paper.17
The full data set for this study consisted of medical claims from 303,404 unique PFK members who were eligible at the start of 2014. Only patients with continuous eligibility for the 3-year period from 2014 to 2016 were selected, resulting in a cohort of 112,641 members. We excluded the subpopulation of children who were on and off Medicaid, as we could not get the complete data of their costs and utilization during those gaps. We developed our predictive model using 2014-2015 data and validated it with 2015-2016 claims data (Figure). Data elements extracted include (1) patient demographic data (ie, sex, age, and zip code) and (2) utilization data (ie, medical codes [diagnosis codes, procedure codes, and medication] and their corresponding medical cost and service detail information [date, type, location, and provider]).
Comparative Predictive Analysis
For the hospitalization prediction task, we tested 5 different inputs to a logistic regression model, including (1) demographic and utilization variables; (2) CCS, which groups related medical codes; (3) CCS plus demographic and utilization variables; (4) deep learning, which learns patient representation automatically; and (5) deep learning plus demographic and utilization variables. In addition, we compared the performance of the models with Verscend’s proprietary DxCG Intelligence predictive model (Cotiviti Inc; Atlanta, Georgia), which is considered to be the gold standard in risk stratification for the Medicare population.18-20
For the deep learning model, we utilized the medical codes that had already been semantically embedded into a lower-dimensional space in the previous step using Skip-Gram. This process was achieved without the expert input of any domain knowledge. In contrast, the original DxCG model based on DCGs was developed by the Health Care Financing Administration, now known as CMS, to predict costs for Medicare beneficiaries. Reliant on practitioner expertise and domain knowledge to handcraft machine learning algorithms from raw data, many commercial vendors and ACOs, including PFK, use the DxCG model as the basis for their risk stratification model. At the same time, CCS is a frequently used grouping method for academic research projects as it is not proprietary.
Area under the curve (AUC) was used as the primary evaluation metric. The AUC quantifies the overall ability of the model to discriminate between those individuals who will be hospitalized in the subsequent year and those who will not. It is a balanced consideration of both sensitivity and specificity of the model. An AUC of 100% indicates that the model can correctly categorize the patients 100% of the time (ie, zero false positives and zero false negatives) whereas an AUC of 50% indicates that the model is performing at chance level.
For the all-inclusive models (with all clinical, demographic, and utilization features), we further evaluated their measures of sensitivity, specificity, positive predictive value, and negative predictive value by selecting a range of percentile of high-risk members for next-year hospitalization.
Given the considerable cost associated with hospitalization, we evaluated the financial impact of our model by comparing the actual cost of the patients predicted to be the top 1% (ie, 99th percentile) or 5% (ie, 95th percentile) at risk for hospitalization. In other words, we used hospitalization risk stratification as a proxy for high cost prediction. Using models 1 through 6, we used the data from 2014 and 2015 to predict the risk of hospitalization in 2016. We then selected the top 1% and 5% of the individuals who had the highest predicted risk for hospitalization in 2016 to calculate the actual cost.
Data Set Descriptive Characteristics
From a total study population of 303,404 members, we selected those who had full eligibility with PFK from 2014 to 2016 and were aged between 2 and 18 years during that time. The final data set consisted of 112,641 individuals who were balanced in gender distribution and had a mean age of 8 years. Table 1 summarizes the demographic and utilization-related characteristics of the study population.
Healthcare utilization in terms of ambulatory and emergency department (ED) visits and hospitalizations remained fairly steady over the 3-year period, whereas the average yearly healthcare expenditure per person steadily increased over the years. See Table 1 for a breakdown of utilization for the study population across years.
Next-Year Hospitalization Prediction
Performance of the models generated in this study is summarized in Table 2. Our result showed that a model using demographic (age, age group, sex) and utilization (previous annual medical cost; number of previous-year hospital, ED, and outpatient visits) features can achieve a relatively good prediction for next-year hospitalization (AUC, 73.1%) (Table 2 [A]).
Deep learning by itself (model 4) outperforms CCS (model 2) and DxCG Intelligence (model 6) on the hospitalization prediction task. When deep learning was coupled with demographic and utilization features (model 5), the best performance was observed (AUC, 75.1%).
Deep learning provides small but meaningful improvements in other measures of predictive performance in addition to the AUC, such as sensitivity and positive predictive value, and comparable performance in specificity and negative predictive value (Table 2 [B]). By selecting the top 10% of high-risk members, the deep learning model had a sensitivity of 0.452 compared with 0.392 to 0.416 for the other models. This means that the model correctly predicted 45.2% of all those who were hospitalized. It is notable that a previous study evaluating the performance of predictive models for a high-cost pediatric population achieved a sensitivity measure of 0.339 with the best model, which included extensive sociodemographic and utilization features along with survey data.8 The models had comparable performance in terms of specificity, which provides a measure of correctly predicting the nonhospitalized members. The positive predictive value is another useful measure when considered in terms of selecting members for care coordination. It reflects the proportion of members predicted to have a high probability of hospitalization (and hence potential targets for preemptive care management) who are actually hospitalized. The deep learning model outperformed the other models in this measure (0.196 vs 0.135-0.166, selecting the top 1%). In contrast, deep learning did not improve the negative predictive value.
Financial Impact of Risk Stratification Using the Hospitalization Prediction Model
In addition to binary prediction of hospitalization or no hospitalization, the logistic regression models described here can also generate risk scores to stratify the patients. As shown in Table 3, the top 1% of individuals (1126 of the 112,641 patients in the study sample) selected by model 5 (deep learning with demographic and utilization features) had a combined healthcare cost of approximately $31.1 million compared with $26.0 million for those selected by DxCG Intelligence. This finding implies that although model 5 only slightly improves the AUC (Table 2 [A]), the financial impact of its use can be significant. In this case, by using model 5 to select the top 1% of high-risk individuals for care management, the ACO could potentially influence an additional $5 million in costs compared with DxCG Intelligence and $6 million compared with model 1.
This analysis found that the deep learning model plus demographic and utilization characteristics from administrative data was a slightly better risk prediction model than the traditional DxCG model. Deep learning eliminates the reliance on practitioner expertise and domain knowledge to handcraft related medical codes into groups. Depending on the size of the data set, it only takes hours for deep learning to run, in contrast to years of development and refinement of DxCG or CCS by domain experts. This machine learning approach is highly efficient, requiring fewer resources to perform equivalently to or slightly better than current models in a much shorter time. The high-risk patients identified by the deep learning plus demographic and utilization model also incurred much higher annual cost. These findings highlight the potential of deep learning in healthcare data to transform the field of population health management for managed care organizations.
The method and approach developed in this study, although preliminary, have the potential to improve the ability of pediatric ACOs to perform preemptive planning of financial risk. The type of deep learning modeling employed in this study may allow ACOs and others that manage patient populations to more appropriately allocate resources for planning and to reimburse their providers based on enhanced predictability of risk. The ability to improve future diagnostic prediction of a panel of patients may also lead to better assessment of provider performance by taking case mix into consideration.
Most importantly and urgently for patients and population health management, more accurately predicting future risks may lead to more realistic and rationalized planning with resources to meet patient demand. By identifying high-risk patients, physicians and other providers can develop future-oriented preemptive care planning and provide preventive treatment interventions before more serious conditions and events take place. Although we note that prediction is an important but narrow methodological tool, there is the potential for care managers to use other data, such as patient surveys or administrative data that include patients’ socioeconomic and environmental variables, to identify cohorts of patients within the high-risk group identified by the deep learning algorithm who also have other risk factors that can be targeted for intervention, such as a group of children at high risk for increased utilization or costs whose parents screened positive for transportation barriers or who speak a second language at home. Providers may also benefit from risk prediction of their patient population by more proactively planning for cost containment and ensuring that optimized clinical management guidelines are in place.
The present study benchmarked a new predictive learning methodology on a small pediatric ACO data set. The conclusions we can make from this pilot work are limited by the small data set. In addition, we choose to enforce continuous eligibility across all 3 years for patients to be included in this preliminary study. This stringent inclusion criterion may distort the underlying ACO population. Future explorations of the deep learning methods over larger data sets can further validate the findings here. Larger data sets would also allow the inclusion of data from other data sources, such as patient surveys, that include the sociodemographic and environmental variables that affect risk.
While the current analysis focuses on the population level, at the level of the individual patient there is potential to use this deep learning method to model individual patient risks that could be used to guide conversations between patients and providers. The risks to be modeled can also be extended from hospitalization to readmission or claim frauds. There is a caveat to be considered when extending this work: The deep learning approach limits the interpretability of the model, as it is hard to decipher how the variables contribute to the deep learning algorithm used to classify high-risk groups.
An additional limitation is that the risk prediction models used to benchmark this new methodology were limited to those available to the study team. We acknowledge that other potential case mix algorithms, such as the Johns Hopkins ACG, have been applied and evaluated in Medicaid populations. However, the ACG is proprietary, whereas the CCS was freely available and the DxCG was the commercial model used by PFK at the time of this study. It is also important to note that our cost savings predictions may be an upper-bound estimate because implementing new care management approaches based on risk identification via the deep learning model would incur additional costs to the system.
From the perspective of population health management, the deep learning approach to risk stratification has clear implications that include (1) enabling the discussion of future policies of risk adjustment based on population risk stratification; (2) providing a generalizable and scalable platform to identify trends in disease occurrence over time, ensuring timely public health investigation and intervention; and (3) improving healthcare utilization forecasting, thus enabling an ACO or other care management entity to identify gaps in healthcare resource allocation for its patient population.
Although deep learning analytic approaches are steadily increasing, the application of these models to risk stratification has been slow to gain traction in healthcare delivery systems. Fortunately, some forward-looking accountable care systems continue to support and conduct research that demonstrates the “value-add” of deep learning methods that may enhance predictive modeling capability and thus ensure better treatment and financial outcomes. In order to achieve more rapid adoption of these innovative methods, several users and perspectives will have to be considered. We have conducted an initial study of this type of modeling versus more traditional methods and found very favorable initial results.
The authors would like to thank Brad Stamm from Partners for Kids for valuable discussion on model evaluation. They also thank Drs Stephen Cardamone, Kelly Kelleher, Deena Chisolm, Ann McAlearney, and Sven Bambach for providing thoughtful feedback on the manuscript.Author Affiliations: Research Information Solutions and Innovation, The Research Institute at Nationwide Children’s Hospital (EJDL, XZ, SM, SML), Columbus, OH; Health Services Management and Policy, College of Public Health (JLH, TH), and Department of Biomedical Informatics and Department of Pediatrics, College of Medicine (SML), The Ohio State University, Columbus, OH; Department of Electrical Engineering and Computer Science, Ohio University (XZ, CL), Athens, OH; Partners for Kids (JK), Columbus, OH.
Source of Funding: None.
Author Disclosures: The authors report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.
Authorship Information: Concept and design (EJDL, JLH, XZ, SM, CL, SML); acquisition of data (EJDL, SM, JK); analysis and interpretation of data (EJDL, XZ, SM, TH, CL, SML); drafting of the manuscript (EJDL, JLH, XZ, TH); critical revision of the manuscript for important intellectual content (EJDL, XZ, SM, TH, JK, SML); statistical analysis (XZ); obtaining funding (EJDL, SML); administrative, technical, or logistic support (EJDL, XL, JLH, SM, JK, CL); and supervision (CL, SML).
Address Correspondence to: Simon M. Lin, MD, MBA, Nationwide Children’s Hospital, 575 Children’s Crossroad, Columbus, OH 43215. Email: Simon.Lin@nationwidechildrens.org.REFERENCES
1. Hacker K, Walker DK. Achieving population health in accountable care organizations. Am J Public Health. 2013;103(7):1163-1167. doi: 10.2105/AJPH.2013.301254.
2. Cowen ME, Dusseau DJ, Toth BG, Guisinger C, Zodet MW, Shyr Y. Casemix adjustment of managed care claims data using the Clinical Classification for Health Policy Research method. Med Care. 1998;36(7):1108-1113. doi: 10.1097/00005650-199807000-00016.
3. Ash AS, Ellis RP, Pope GC, et al. Using diagnoses to describe populations and predict costs. Health Care Financ Rev. 2000;21(3):7-28.
4. Weir S, Aweh G, Clark RE. Case selection for a Medicaid chronic care management program. Health Care Financ Rev. 2008;30(1):61-74.
5. Haas LR, Takahashi PY, Shah ND, et al. Risk-stratification methods for identifying patients for care coordination. Am J Manag Care. 2013;19(9):725-732.
6. Kahn H, Parke R, Yi R. Risk adjustment for pediatric populations. Milliman website. milliman.com/uploadedFiles/insight/2013/risk-adjustment-for-pediatric-populations-healthcare-reform-bulletin.pdf. Published November 2013. Accessed February 6, 2019.
7. Yu H, Dick AW. Risk-adjusted capitation rates for children: how useful are the survey-based measures? Health Serv Res. 2010;45(6, pt 2):1948-1962. doi: 10.1111/j.1475-6773.2010.01165.x.
8. Leininger LJ, Saloner B, Wherry LR. Predicting high-cost pediatric patients: derivation and validation of a population-based model. Med Care. 2015;53(8):729-735. doi: 10.1097/MLR.0000000000000391.
9. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. doi: 10.1038/nature14539.
10. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform. 2018;22(5):1589-1604. doi: 10.1109/JBHI.2017.2767063.
11. Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J. Doctor AI: predicting clinical events via recurrent neural networks. JMLR Workshop Conf Proc. 2016;56:301-318.
12. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep. 2016;6:26094. doi: 10.1038/srep26094.
13. Kelleher KJ, Cooper J, Deans K, et al. Cost saving and quality of care in a pediatric accountable care organization. Pediatrics. 2015;135(3):e582-e589. doi: 10.1542/peds.2014-2725.
14. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013;2:3111-3119.
15. Minarro-Giménez JA, Marín-Alonso O, Samwald M. Exploring the application of deep learning techniques on medical text corpora. Stud Health Technol Inform. 2014;205:584-588. doi: 10.3233/978-1-61499-432-9-584.
16. Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Informatics Assoc. 2018;25(10):1419-1428. doi: 10.1093/jamia/ocy068.
17. Zeng X, Moosavinasab S, Lin EJD, Bunescu R, Liu C. Distributed representation of patients and its use for medical cost prediction. arXiv website. arxiv.org/abs/1909.07157. Published September 13, 2019. Accessed September 18, 2019.
18. Wagner TH, Upadhyay A, Cowgill E, et al. Risk adjustment tools for learning health systems: a comparison of DxCG and CMS-HCC V21. Health Serv Res. 2016;51(5):2002-2019. doi: 10.1111/1475-6773.12454.
19. Hui RL, Yamada BD, Spence MM, Jeong EW, Chan J. Impact of a Medicare MTM program: evaluating clinical and economic outcomes. Am J Manag Care. 2014;20(2):e43-e51.
20. Chen J, Ellis RP, Toro KH, Ash AS. Mispricing in the Medicare Advantage risk adjustment model. Inquiry. 2015;52:1-7. doi: 10.1177/0046958015583089.