Predictive analytics–driven disease management outperforms standard of care among patients with chronic heart failure.
Objectives: To evaluate the effect of a predictive algorithm–driven disease management (DM) outreach program compared with non–predictive algorithm–driven DM program participation on health care spending and utilization.
Study Design: We used propensity score matching forMedicare Advantage members with chronic heart failure (CHF) to evaluate the impact of predictive algorithm–driven DM outreach using claims data from 2013 to 2018 from a large commercial health insurer.
Methods: The insurer ran a predictive algorithm to identify members with CHF with a high likelihood of hospitalization (LOH), and a DM outreach was initiated to those identified as being at high risk of hospitalization (high-LOH intervention group). The intervention group was matched to members with similar concurrent medical risk profiles, based on the DxCG/Verisk score, who received the same DM outreach through the insurer’s standard process (low-LOH intervention group). This approach allowed an evaluation of the predictive algorithm in targeting individuals suitable for DM outreach.
Results: Regression models showed that high-LOH intervention members had a lower probability of hospitalization (0.032; P = .075) and emergency department (ED) visit (0.039; P = .043) in the year after the outreach compared with low-LOH intervention members, leading to lower total outpatient spending ($1517; P < .001). Analyses for no-intervention members showed that predictive outreach members would have been expected to have higher inpatient and ED utilization and higher medical spending compared with the traditional care members.
Conclusions: A prediction-driven DM outreach program among patients with CHF was effective in reducing medical spending in the year after the outreach compared with traditional DM outreach programs.
Am J Manag Care. 2022;28(12):668-674. https://doi.org/10.37765/ajmc.2022.89275
Disease management (DM) is a tool commonly used by health plans that is advertised to reduce acute care needs and downstream health care utilization and spending.1-5 However, many DM programs have shown little evidence of long-term cost reduction, even when targeted at chronic and high-need populations.6-8 Potential reasons for the lack of effective DM may be tied to the fact that only a certain fraction of the population may benefit from such intervention or because the intervention is not strong enough to remedy acute care needs.2,9,10 Considerations to identify individuals who not only require DM but may also benefit from an intervention are growing across the health care ecosystem.11,12 One approach is to identify a population that will most likely experience care needs and for which a DM intervention may be effective. This can potentially be achieved with machine learning techniques.13-15
To date, relatively little evidence exists that displays the efficacy of targeted DM interventions using predictive modeling compared with traditional DM intervention approaches that do not utilize predictive modeling to inform participation.16 Although previous work has shown that machine learning and predictive modeling techniques can be an effective tool in the clinical setting and in performing DM, there is no evidence that predictive modeling may be effective compared with traditional DM programs.17,18 To fill this gap, an insurer implemented a predictive algorithmic approach to evaluate whether a DM outreach initiation to a highly targeted subpopulation was effective in reducing downstream hospitalizations, outpatient care, and spending compared with the standard-of-care DM outreach program, which is the same but does not rely on systematic analytics.
This analysis evaluated the effect of a predictive analytics–driven DM outreach intervention compared with standard-of-care nonpredictive DM outreach among individuals with chronic heart failure (CHF) enrolled in a Medicare Advantage plan between 2013 and 2018. The analysis limited the intervention and comparison groups to individuals with similar concurrent medical risk scores, thereby allowing us to identify the impact of the predictive algorithm in identifying members especially suited for the treatment. The study also provided a counterfactual analysis that estimates utilization and spending patterns for the predictive analytics–driven DM outreach and standard-of-care DM members had they not received the intervention.
Data for Medicare Advantage members with a diagnosis of CHF were obtained from a large insurance company from 2013 to 2018. Members were identified as having CHF using the CMS definition, requiring a CHF diagnosis in the past 2 years on any claim, plus at least 1 hospital inpatient or outpatient claim for CHF. The data include member demographics, all inpatient and outpatient utilization, and health care costs data. We utilize demographic, health risk, and health plan information on age, gender, a standard concurrent risk score (also referred to as Verisk score or DxCG score), and pharmacy benefits to describe the general medical risk of the member.
Predictive Algorithm and Intervention
The insurer developed a proprietary algorithm that was fed more than 1500 patient-level characteristics derived from medical claims (eg, utilization, cost, diagnoses, procedures), clinical data (eg, laboratory results, including counts of abnormal lab results; clinical alerts of gaps in care), and additional member data (eg, demographics, geography, health plan information, member interactions with the health plan).
The model aimed to identify members with potential health deteriorations likely to trigger an acute inpatient hospitalization in the next 6 months. The final algorithm included roughly 500 member-level characteristics and was an ensemble classification model built in SAS Enterprise Miner (SAS Institute). The likelihood of hospitalization (LOH) algorithm was applied to the entire Medicare Advantage member population with a diagnosis of CHF in 16 separate waves (time periods) between 2013 and 2016.
These LOH predictive scores were used to identify members for outreach of an existing comprehensive care management program designed to intervene and mitigate downstream health care utilization. Registered nurse (RN) health coaches were sent a list of members with the highest LOH risk. The number of members on the list corresponded with the RN team’s capacity to conduct member outreach and intervention, resulting in varying numbers of members engaged with the DM program over time. Thus, the number of patients who were contacted was not based on patients’ interest in the program. More details regarding the predictive model cutoff points can be found in David et al.17,18
The RN health coaches involved in the DM program were tasked with closing gaps in care, performing medication reconciliation, coordinating appointments with health care providers, and connecting clinical and nonclinical stakeholders. The intervention also included CHF-specific condition management designed to help members understand their diagnosis and how to manage it, including supporting lifestyle changes to reduce health risks and facilitate symptom monitoring and management. Outreach was initiated through phone calls, but not all calls resulted in enrollment in a DM program. RNs did not know whether the member was identified through the LOH algorithm or through other means.
In this study, the intervention group was defined as members who received a high predicted LOH score from the algorithm, who had a mean LOH score of 77% and were contacted by the RN (Figure, high-LOH intervention group). We compared these members with a matched comparison group of members who were also contacted by RNs through existing efforts by the insurer to improve health outcomes. The matched comparison group was also limited to members with CHF, was similar on observable characteristics (age, gender), and was especially similar in terms of medical risk profile (concurrent medical risk score/Verisk score) based on a matching algorithm described below, but this group had low LOH scores (Figure, low-LOH intervention group). The concurrent medical risk score (Verisk risk score or DxCG risk score) is a risk assessment score that is commonly used to describe patient medical risk profiles. The DxCG risk score has its root in CMS efforts to describe patient risk, which ultimately led to the hierarchical condition categories.19
We also compared the high-LOH intervention group with a second cohort of matched members with CHF who were identical across all member-level characteristics including having high LOH scores but who did not receive a DM outreach (Figure, high-LOH no-intervention group). There were several reasons why members meeting the profile for care management were not enrolled in the program. For example, outreach volume was dictated by the capacity that care management RN health coaches could handle at a given time. Capacity and corresponding wave sizes varied (eg, 100 members or 1500 members) relative to available RN health coaches and caseload. We make a third comparison between the low-LOH intervention group and a matched set of members with CHF who also had low LOH scores but did not receive a DM outreach (Figure, low-LOH no-intervention group). Thus, the LOH no-intervention groups represent members with CHF who are observably similar in health risk to the treatment groups, allowing us to understand spending and utilization patterns for the treatment groups had they not received the DM outreach.
The main goal was to estimate whether the DM outreach program was effective in a predictive model–identified high-risk population relative to the traditional DM outreach population that did not use any predictive modeling. The study focused on medical use and spending trends among members who received an outreach for DM, where some received the outreach via the algorithm and others did not, and followed their utilization patterns across time.
The challenge was to identify suitable control members who did not receive outreach through the algorithm and were still comparable in observable characteristics. As a large number of members received DM through channels other than the predictive algorithm, we utilized, among other demographics, the standard concurrent Verisk risk score rating in the month the predictive algorithm was run to select individuals who should receive the outreach. The standard concurrent Verisk risk score evaluates current patient complexity risk based on claims data. The score was different from the predictive algorithm score because the predictive algorithm reflects the current and expected risk level of members. Thus, patients may look very similar in terms of concurrent Verisk risk score but can have very different predictive risk scores in a given month.
To arrive at comparable member groups, we performed 1:1 matching using the predictive algorithm outreach members and the traditional outreach members. We used coarse exact matching that matched members based on the minimum difference in the distribution of the variables included in the matching process. The variables included in the matching model were concurrent Verisk risk score, age, gender, and outreach wave. We then performed multivariate regression analysis that compared the spending trend among predictive outreach members with that among traditional outreach members using demographic variables (age, gender), wave fixed effects, the Verisk risk score, pharmacy coverage status, and expiration date (ie, date patient died or left the sample) as control variables in the 12 months after the outreach. We explicitly did not include the predictive algorithm score in the regression model because we would expect that the predictive score is highly correlated with health care spending in the 12 months after the outreach, and adjusting for this variable would absorb changes in spending attributable to predictive risk score differences in the treatment and comparison groups.
In secondary analyses, we were also interested in the level of care utilization of the outreach groups in the absence of outreach. To arrive at appropriate comparison members for both outreach groups, we matched the predictive algorithm outreach members to nonoutreach members based on the Verisk risk score, predictive risk score, gender, and age. We similarly matched the traditional outreach members to other nonoutreach members based on the same variables as above. Including the predictive and Verisk risk scores when selecting counterfactual groups satisfied that predictive outreach members were compared with members who did not receive the outreach but had similar high predictive scores of future care needs (and thus propensities to receive the outreach). The Figure displays the samples and matching criteria.
Our main outcome variables were health care utilization and spending in the 12 months after the outreach. We created utilization and spending outcomes to holistically review the care patterns observed after the DM intervention. Utilization outcomes were calculated in binary (extensive margin) and continuous outcomes for total hospital admissions, emergency department (ED) visits, ED visits leading to a hospitalization, primary care visits, cardiologist visits, and general specialist visits. Outcomes for spending included total medical spending (inpatient + outpatient + professional spending), total outpatient spending, and total inpatient spending. Care utilization and spending were measured from the time of the intervention up to and including the 12 postintervention months. The outcomes were chosen based on the expected impact of the intervention on inpatient and outpatient care utilization over time.
Analyses were performed using Stata version 15 (StataCorp). A P value less than .10 was considered statistically significant. SDs and SEs are displayed in parentheses in all tables, and 95% CIs around estimates are reported in the text.
There were 1592 high-LOH intervention group members who were matched to 1592 low-LOH intervention group members. Sociodemographic characteristics and the Verisk risk score were similar for both groups, suggesting that the matching algorithm was successful in balancing observable characteristics (Table 1). The mean age of the sample was about 80 years, 55% of the sample were women, and many had pharmacy benefits. Of note is that the mean Verisk risk score, measuring the present health risk of the member, was about 31, while the predictive model risk score identifying future health needs was high among the predictive outreach group (77%) and low in the comparison outreach group (39%). The discrepancy in predictive risk score was expected because high risk score is required to receive the outreach through the predictive model. Because comparison members did not receive the outreach through the predictive model targeting, they had to have lower predictive scores.
The counterfactual samples of matched members who did not receive a DM outreach showed similar observable characteristics (Table 1, columns 3-4). The counterfactual groups (high-LOH no intervention and low-LOH no intervention) had somewhat lower mean Verisk risk scores and pharmacy coverage. The matched control groups were somewhat smaller in size.
Descriptive evidence of medical use and spending in the year after the intervention showed that both intervention groups had similar care patterns (Table 2). Total overall medical spending ($27,308 and $27,749) was similar, but inpatient spending was higher for the high-LOH intervention group than the low-LOH intervention group ($13,534 vs $12,857), whereas outpatient spending was lower for the high-LOH intervention group compared with the low-LOH intervention group ($6126 vs $7710). The findings suggest differences of intensity of treatment, as the probability and intensity of inpatient and outpatient care encounters were similar across both groups.
Health care spending and utilization trends were higher among the high-LOH no-intervention group compared with the low-LOH no-intervention group (Table 2). For example, medical spending was $18,065 in the year for the high-LOH no-intervention group, and $15,454 among the low-LOH no-intervention control group. The overall levels of health care interactions were lower for the 2 counterfactual groups compared with the outreach groups, potentially due to the lower medical risk captured by the Verisk risk score.
High-LOH intervention members were 6% less likely to have a hospital admission (–0.03 percentage points [PP]; 95% CI, –0.07 to –0.01 PP; P = .061) compared with low-LOH intervention members. Lower hospital visits were driven by 8% fewer ED visits resulting in a hospital admission (–0.04 PP; 95% CI, –0.07 to –0.01 PP; P < .001). However, total medical spending was not significantly lower among high-LOH intervention members compared with low-LOH intervention members (–$965; 95% CI, –$3277 to $1347; P = .410) (Table 3). Total mean inpatient spending was similar ($96; 95% CI, –$1356 to $1547; P = .890).
High-LOH intervention members had a 6% lower probability of having an ED visit (–4 PP; 95% CI, –0.07 to –0.01; P = .029) compared with low-LOH intervention members, but no difference in probability of use of primary care, cardiologist, or specialist care was found. However, outpatient spending decreased significantly by 20% (–$1517; 95% CI, –$2482 to –$551; P = .002) for the high-LOH intervention group compared with the low-LOH intervention group in the 12 months after the outreach.
The high-LOH no-intervention group had higher rates of ED visits (0.20; 95% CI, 0.03-0.38; P = .021) compared with low-LOH no-intervention members, as well as higher probability of a hospital admission (0.05; 95% CI, 0.01-0.09; P = .030). No difference in primary, cardiology, or specialist care became evident. The high-LOH no-intervention group had higher total medical spending ($1981; 95% CI, –$104 to $4065; P < .063) and higher outpatient spending ($829; 95% CI, –$75 to $1733; P = .072) compared with the low-LOH no-intervention group.
In a novel evaluation of a predictive analytics–driven DM program, those who received the intervention due to a high predictive score were associated with lower likelihoods of having an inpatient hospitalization and ED use, which decreased outpatient spending compared with members who received the DM intervention not driven by the predictive analytics model in the 12 months following outreach. To understand how spending patterns among both groups would have evolved had no intervention taken place, we show that predictive outreach members would have experienced significantly worse health outcomes, measured by higher rates of hospitalizations and ED use.
Few DM programs include non–claims-based data to classify suitable members, aim interventions not just at high-risk populations, and also target individuals most likely to benefit from care. DM programs are aimed at closing disease-specific gaps in care, although the lag time between claims processing and gaps in care lead to a significant delay in realizing health need. Clinical information and information exchange with providers may be beneficial to target specific subgroups, but their administrative and organizational burden are costly to the insurer and they may therefore be less common. Acknowledging that current approaches to DM may be too broad, especially as the number of individuals with chronic diseases is growing in the United States, can be one step further to more targeted approaches.20
Our results show that even a targeted DM intervention has its limits. Our findings suggest that it seems to be effective in reducing the probability of having any ED visit or inpatient stay but does not reduce the number of ED visits and inpatient stays. These findings suggest that a targeted intervention may work well for individuals who are at the margin of an escalating health need when DM can stabilize the patient, but it does not work for members who are likely to have persistently high levels of care need.
To our knowledge, there have been no previous studies evaluating the differential impact of predictive analytics–driven DM outreach programs vs the same nonpredictive DM program approach. To date, the best evidence provides mixed conclusions for targeted DM outreach programs’ efficacy.6,12,21-23 Recent evidence suggests that a targeted approach can result in fewer acute hospitalizations and ED visits, which can lead to cost savings.17,18 A randomized trial had similar implications and showed that the cost of intensive DM programs for high-risk populations generally outstrips the downstream health care cost savings.6
The study has several limitations. First, the intervention analyzed does not necessarily lead to closures of gaps. It was a voluntary program and could have also informed and updated individuals’ beliefs regarding their own health status, regardless of the intervention. Further, the extent to which members may have engaged with DM may differ among members, something we cannot adequately measure. Second, the intervention was targeted to intervene specifically among Medicare Advantage members with CHF; therefore, conclusions on the effectiveness of the outreach may differ for other groups with chronic diseases in whom intervention protocols may be different. Third, the predictive algorithm generally relies on claims information to identify high-risk members. Claims data were not developed for predictive modeling and are therefore lacking clinical detail that physicians and hospitals may not submit to the insurer. Fourth, our findings rely on matching techniques that use observable characteristics available in health insurance claims data. Thus, other dimensions along which matched groups differ could not be accounted for.
A predictive analytics–driven DM outreach was associated with reducing outpatient health care utilization compared with a nonpredictive DM outreach protocol among members with CHF. However, the lower utilization translated to only statistically significantly lower outpatient spending, while total medical spending remained similar. Future research is needed to assess the efficacy among other chronic groups and disease contexts.
Author Affiliations: Texas A&M University (BU), College Station, TX; University of Pennsylvania (GD), Philadelphia, PA; Independence Blue Cross (AS-M), Philadelphia, PA.
Source of Funding: None.
Author Disclosures: Dr Smith-McLallen is an employee of Independence Blue Cross. The remaining authors report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.
Authorship Information: Concept and design (BU, GD); acquisition of data (BU, GD, AS-M); analysis and interpretation of data (BU, GD, AS-M); drafting of the manuscript (GD); critical revision of the manuscript for important intellectual content (BU, GD, AS-M); and supervision (GD).
Address Correspondence to: Benjamin Ukert, PhD, Texas A&M University, 212 Adriance Lab Rd, College Station, TX 77843. Email: firstname.lastname@example.org.
1. Fireman B, Bartlett J, Selby J. Can disease management reduce health care costs by improving quality? Health Aff (Millwood). 2004;23(6):63-75. doi:10.1377/hlthaff.23.6.63
2. Simcoe T, Catillon M, Gertler P. Who benefits most in disease management programs: improving target efficiency. Health Econ. 2019;28(2):189-203. doi:10.1002/hec.3836
3. Nyweide DJ, Bynum JPW. Relationship between continuity of ambulatory care and risk of emergency department episodes among older adults. Ann Emerg Med. 2017;69(4):407-415.e3. doi:10.1016/j.annemergmed.2016.06.027
4. Amjad H, Carmichael D, Austin AM, Chang CH, Bynum JPW. Continuity of care and health care utilization in older adults with dementia in fee-for-service Medicare. JAMA Intern Med. 2016;176(9):1371-1378. doi:10.1001/jamainternmed.2016.3553
5. Berkowitz SA, Hulberg AC, Standish S, Reznor G, Atlas SJ. Addressing unmet basic resource needs as part of chronic cardiometabolic disease management. JAMA Intern Med. 2017;177(2):244-252. doi:10.1001/jamainternmed.2016.7691
6. Peikes D, Chen A, Schore J, Brown R. Effects of care coordination on hospitalization, quality of care, and health care expenditures among Medicare beneficiaries: 15 randomized trials. JAMA. 2009;301(6):603-618. doi:10.1001/jama.2009.126
7. Bott DM, Kapp MC, Johnson LB, Magno LM. Disease management for chronically ill beneficiaries in traditional Medicare. Health Aff (Millwood). 2009;28(1):86-98. doi:10.1377/hlthaff.28.1.86
8. Nelson L. Lessons from Medicare’s demonstration projects on disease management and care coordination. Congressional Budget Office. January 18, 2012. Accessed April 1, 2022. https://www.cbo.gov/publication/42860
9. Kranker K. Effects of Medicaid disease management programs on medical expenditures: evidence from a natural experiment in Georgia. J Health Econ. 2016;46:52-69. doi:10.1016/j.jhealeco.2016.01.008
10. Brown RS, Peikes D, Peterson G, Schore J, Razafindrakoto CM. Six features of Medicare coordinated care demonstration programs that cut hospital admissions of high-risk patients. Health Aff (Millwood). 2012;31(6):1156-1166. doi:10.1377/hlthaff.2012.0393
11. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff (Millwood). 2014;33(7):1123-1131. doi:10.1377/hlthaff.2014.0041
12. Rich MW, Beckman V, Wittenberg C, Leven CL, Freedland KE, Carney RM. A multidisciplinary intervention to prevent the readmission of elderly patients with congestive heart failure. N Engl J Med. 1995;333(18):1190-1195. doi:10.1056/NEJM199511023331806
13. Kansagara D, Englander H, Salanitro A, et al. Risk prediction models for hospital readmission: a systematic review. JAMA. 2011;306(15):1688-1698. doi:10.1001/jama.2011.1515
14. Navathe AS, Zhong F, Lei VJ, et al. Hospital readmission and social risk factors identified from physician notes. Health Serv Res. 2018;53(2):1110-1136. doi:10.1111/1475-6773.12670
15. Morgan DJ, Bame B, Zimand P, et al. Assessment of machine learning vs standard prediction rules for predicting hospital readmissions. JAMA Netw Open. 2019;2(3):e190348. doi:10.1001/jamanetworkopen.2019.0348
16. Triantafyllidis AK, Tsanas A. Applications of machine learning in real-life digital health interventions: review of the literature. J Med Internet Res. 2019;21(4):e12286. doi:10.2196/12286
17. David G, Smith-McLallen A, Ukert B. The effect of predictive analytics–driven interventions on healthcare utilization. J Health Econ. 2019;64:68-79. doi:10.1016/j.jhealeco.2019.02.002
18. Ukert B, David G, Smith-McLallen A, Chawla R. Do payor-based outreach programs reduce medical cost and utilization? Health Econ. 2020;29(6):671-682. doi:10.1002/hec.4010
19. The evolution of DxCG, the gold standard in risk adjustment and predictive modeling. Cotiviti. Accessed April 1, 2022. https://resources.cotiviti.com/population-health-analytics/cotiviti-whitepaper-evolutionofdxcg
20. Ward BW, Schiller JS. Prevalence of multiple chronic conditions among US adults: estimates from the National Health Interview Survey, 2010. Prev Chronic Dis. 2013;10:E65. doi:10.5888/pcd10.120203
21. Naylor MD, Brooten DA, Campbell RL, Maislin G, McCauley KM, Schwartz JS. Transitional care of older adults hospitalized with heart failure: a randomized, controlled trial. J Am Geriatr Soc. 2004;52(5):675-684. doi:10.1111/j.1532-5415.2004.52202.x
22. DeBusk RF, Houston Miller N, Parker KM, et al. Care management for low-risk patients with heart failure: a randomized, controlled trial. Ann Intern Med. 2004;141(8):606-613. doi:10.7326/0003-4819-141-8-200410190-00008
23. Taubman SL, Allen HL, Wright BJ, Baicker K, Finkelstein AN. Medicaid increases emergency-department use: evidence from Oregon’s Health Insurance Experiment. Science. 2014;343(6168):263-268. doi:10.1126/science.1246183