Currently Viewing:
The American Journal of Managed Care December 2015
Interest in Mental Health Care Among Patients Making eVisits
Steven M. Albert, PhD; Yll Agimi, PhD; and G. Daniel Martich, MD
The Impact of Electronic Health Records and Teamwork on Diabetes Care Quality
Ilana Graetz, PhD; Jie Huang, PhD; Richard Brand, PhD; Stephen M. Shortell, PhD, MPH, MBA; Thomas G. Rundall, PhD; Jim Bellows, PhD; John Hsu, MD, MBA, MSCE; Marc Jaffe, MD; and Mary E. Reed, DrPH
Health IT-Assisted Population-Based Preventive Cancer Screening: A Cost Analysis
Douglas E. Levy, PhD; Vidit N. Munshi, MA; Jeffrey M. Ashburner, PhD, MPH; Adrian H. Zai, MD, PhD, MPH; Richard W. Grant, MD, MPH; and Steven J. Atlas, MD, MPH
A Health Systems Improvement Research Agenda for AJMC's Next Decade
Dennis P. Scanlon, PhD, Associate Editor, The American Journal of Managed Care
An Introduction to the Health IT Issue
Jeffrey S. McCullough, PhD, Assistant Professor, University of Minnesota School of Public Health; Guest Editor-in-Chief for the health IT issue of The American Journal of Managed Care
Preventing Patient Absenteeism: Validation of a Predictive Overbooking Model
Mark Reid, PhD; Samuel Cohen, MD; Hank Wang, MD, MSHS; Aung Kaung, MD; Anish Patel, MD; Vartan Tashjian, BS; Demetrius L. Williams, Jr, MPA; Bibiana Martinez, MPH; and Brennan M.R. Spiegel, MD, MSHS
EHR Adoption Among Ambulatory Care Teams
Philip Wesley Barker, MS; and Dawn Marie Heisey-Grove, MPH
Impact of a National Specialty E-Consultation Implementation Project on Access
Susan Kirsh, MD, MPH; Evan Carey, MS; David C. Aron, MD, MS; Omar Cardenas, BS; Glenn Graham, MD, PhD; Rajiv Jain, MD; David H. Au, MD; Chin-Lin Tseng, DrPH; Heather Franklin, MPH; and P. Michael Ho, MD, PhD
E-Consult Implementation: Lessons Learned Using Consolidated Framework for Implementation Research
Leah M. Haverhals, MA; George Sayre, PsyD; Christian D. Helfrich, PhD, MPH; Catherine Battaglia, PhD, RN; David Aron, MD, MS; Lauren D. Stevenson, PhD; Susan Kirsh, MD, MPH; P. Michael Ho, MD, MPH; and Julie Lowery, PhD
Patient-Initiated E-mails to Providers: Associations With Out-of-Pocket Visit Costs, and Impact on Care-Seeking and Health
Mary Reed, DrPH; Ilana Graetz, PhD; Nancy Gordon, ScD; and Vicki Fung, PhD
Currently Reading
Innovations in Chronic Care Delivery Using Data-Driven Clinical Pathways
Yiye Zhang, MS; and Rema Padman, PhD
Characteristics of Residential Care Communities That Use Electronic Health Records
Eunice Park-Lee, PhD; Vincent Rome, MPH; and Christine Caffrey, PhD
Using Aggregated Pharmacy Claims to Identify Primary Nonadherence
Dominique Comer, PharmD, MS; Joseph Couto, PharmD, MBA; Ruth Aguiar, BA; Pan Wu, PhD; and Daniel Elliott, MD, MSCE
Physician Attitudes on Ease of Use of EHR Functionalities Related to Meaningful Use
Michael F. Furukawa, PhD; Jennifer King, PhD; and Vaishali Patel, PhD, MPH

Innovations in Chronic Care Delivery Using Data-Driven Clinical Pathways

Yiye Zhang, MS; and Rema Padman, PhD
This paper demonstrates that data-driven clinical pathways can be developed using electronic health record data to facilitate innovations in practice-based care delivery for chronic disease management.


Objectives: Chronic diseases are common, complex, and expensive health conditions that can benefit from innovations in healthcare service delivery enabled by information technology and advanced analytic methods. This paper proposes a data-driven approach, illustrated in the context of chronic kidney disease (CKD), to develop clinical pathways of care delivery from electronic health record (EHR) data.

Study Design: We analyzed structured and de-identified EHR data from 2009 to 2013 of 664 CKD patients with multiple chronic conditions.

Methods: Machine learning algorithms were used to learn data-driven and practice-based clinical pathways that cluster patients into subgroups and model the co-progression of their encounter types, diagnoses, medications, and biochemical measurements. Given a pattern of biochemical measurements, our algorithm identifies the most probable clinical pathways, and makes predictions regarding future states, with and without temporal information. CKD stages, their complications, and common medications are included in the clinical pathways.

Results: Using the EHR data of 664 patients who were initially in CKD stage 3 and hypertensive, we identified 7 patient subgroups—each distinguished primarily by the type of complications suffered by the patients. Our algorithm demonstrates fair accuracy (up to 44% and 75%, respectively) in learning the most probable clinical pathways and predicting future states associated with temporal patterns of biochemical measurements and patient subgroups.

Conclusions: Data-driven clinical pathway learning summarizes multidimensional and longitudinal information from EHRs into clusters of common sequences of patient visits that may assist in the efficient review of current practices and identifying potential innovations in the care delivery process.

Am J Manag Care. 2015;21(12):e661-e668

Take-Away Points
  • The availability of high-volume, time-stamped, and individual-level health data is beginning to facilitate clinical interventions that are personalized and predictive. 
  • Healthcare service delivery can benefit greatly from advanced statistical and machine learning models and algorithms that can learn personalized insights from electronic health record (EHR) data. 
  • Data-driven clinical pathways that describe the co-progression of encounter types, diagnoses, medications, and individual biochemical measurements can be learned from EHR data, using statistical and machine learning methods to support the review of current practices and innovate healthcare delivery approaches. 
  • Our proposed methodology is generalizable to other clinical conditions and can accommodate varying numbers of clinical and other relevant factors.
According to the World Health Organization, 60% of all deaths, worldwide, can be attributed to chronic diseases such as diabetes, heart disease, stroke, and cancer; they are also a major cause of poverty and lack of economic development.1 As part of a multi-pronged effort to address this challenge, innovations in chronic care delivery are beginning to leverage advanced statistical and machine learning models and algorithms to obtain new insights into care quality, outcomes, and cost.2-4 Machine learning is the science of constructing algorithms that learn from large volumes of data in order to facilitate decision making by generating potentially new insights; it has gained widespread implementation across many industries today.5 Just a few examples of machine learning applications are speech recognition, self-driving cars, and personalized online experiences.6-8

Although innovations driven by machine learning have seen tremendous success,9,10 subsequently resulting in improved service performance, productivity, and growth,11-13 for a variety of reasons, the healthcare industry has been relatively slow to incorporate these techniques into decision-support applications and to adapt to resulting changes.14-16 For instance, in making treatment decisions, many clinicians may prefer to use clinical practice guidelines (CPGs) over predictions generated by machine learning algorithms—algorithms which may seem like a “black box” with little relevance to actual clinical decision making.17 However, many of the current clinical decision support capabilities, whether CPG-embedded electronic health record (EHR) interactivity or computerized provider order entry (CPOE) application, are designed by humans and target the “average patient.”

As the Precision Medicine initiative states,18 we are now in an era in which clinical interventions need to be personalized and predictive, and so should decision support recommendations. To meet this objective, it is no longer sufficient to rely on CPGs, often created based on consensus opinions or randomized clinical trials that have strict enrollment criteria. Rather, with the tremendous amount of data being accumulated in EHRs from the enactment of the Health Information Technology for Economic and Clinical Health (HITECH) Act as part of the American Recovery and Reinvestment Act,19 healthcare service delivery can also benefit greatly from advanced statistical and machine learning models and algorithms that can learn potentially useful insights from large amounts of highly detailed data collected daily, as part of routine care delivered in multiple, diverse settings.

Traditional topics in machine learning include classification and unsupervised learning.5 Classification refers to the method of labeling unknown data to target variables through training a classification model using labeled data. Logistic regression and naïve Bayes are examples of classification algorithms.5 For example, Lee et al used logistic regression to predict 7-day mortality from heart failure in emergent care using initial vital signs, clinical and presentation features, and laboratory tests.20 Unsupervised learning refers to the identification of latent groups in the data. Unlike classification, which is also called “supervised learning,” unsupervised learning does not have true labels, typically does not have true labels, and users need to predefine the number of latent groups. K-means and hierarchical clustering are 2 of the most common unsupervised learning algorithms.5

Zhang et al used a variant of the K-means clustering algorithm to design more efficient order sets from historical order data in a pediatric inpatient setting.21 Order sets are groups of relevant orders traditionally clustered together by clinical experts and used within CPOE; this is an example of a manually designed healthcare information technology application that requires significant labor- and knowledge-intensive effort for maintenance and update. In the same study, Zhang et al demonstrated that order sets can be created using machine learning algorithms, with the resulting data-driven order sets requiring less physical and cognitive workload in usage because the methods were trained to find the optimal combinations of orders that matched, with order data generated from actual work flow. In addition to these classical approaches, many advanced machine learning algorithms have been developed and applied over the years to facilitate a more efficient, safer healthcare system.22-25

In this paper, we present a machine learning approach for learning the most probable, data-driven clinical pathways from the EHR data of patients with chronic kidney disease (CKD), and predicting the most probable upcoming interventions at any stage, given recent history. CKD is a chronic condition that currently affects more than 26 million US adults, with an additional 73 million at increased risk for the disease.26 It is also associated with increased risk for cardiovascular disease and acute kidney injury (AKI), and the majority of the patients also suffer from comorbidities such as hypertension and diabetes.26 Consequently, CKD management is complex and expensive, and a large proportion of the US Medicare budget every year is allocated for the treatment of CKD.27 Specifically, the per person per year average cost of treating CKD was $23,128 in 2011—more than twice the average cost of treating non-CKD conditions in the Medicare population ($11,103).27 With the cost increasing and quality of life decreasing as the disease progresses to end-stage renal disease (ESRD),27 there is a growing imperative to pursue innovations in service delivery and management of CKD and other chronic conditions that may generate improved health outcomes, cost savings, and patient satisfaction.4

Additionally, generating the highest quality scientific evidence and associated practice recommendations for chronic conditions such as CKD is a continuing challenge for the healthcare field.3 One of the most recent CPGs for CKD was published by the National Kidney Foundation’s Kidney Disease Outcomes Quality Initiative in 2012, which is an update of its 2007 guideline. However, of its 7 key recommendations, only 2 recommendations received the highest grade from the Evidence Review Team of the guideline Work Group for strength of recommendation (“recommend” vs “suggest”), and the highest grade for quality of evidence (“high” vs “moderate,” “low,”  “insufficient”), while other recommendations received lower grades for strength of recommendations and for the quality of evidence.28

In this paper, we propose that evidence from actual practices, particularly those that include large number of patients in local treatment settings over reasonable durations, may be used to assist guideline development. We present methods for knowledge extraction from data using machine learning algorithms, and demonstrate that such knowledge can be regarded as practice-based, data-driven clinical pathways. Clinical pathways translate CPG recommendations into an actionable plan such as flow charts, and are used by more than 80% of US hospitals for at least 1 intervention.29 This research aims to develop clinical pathways not strictly based on CPGs, but practice-based evidence learned from data. An overall framework of our approach that supports a learning healthcare system is presented in Figure 1.


Prior Work

Data-driven clinical pathway learning has garnered research interest since the 1990s,30-38 but there is limited research on machine learning approaches for the problem. Recently, Lakshmanan et al used a type of clustering algorithm, called DBScan, to cluster patients’ history prior to pathway learning, and applied SPAM, an algorithm to find frequent patterns in pathways, to associate patterns with patient outcomes.33 Huang et al used topic model, a recently developed probabilistic method, for learning latent topics from documents, to discover clinical pathway patterns from EHR event logs.38 Zhang et al modeled clinical pathways as Markov chains that included the co-progression of multiple interventions and diagnoses, and visualized them to allow identification of variations in care and outcomes across latent patient subgroups.39

In this paper, we combine clustering and temporal modeling to elicit common clinical pathways from the data. Specifically, given patient characteristics and a sequence of laboratory observations from multiple laboratory tests, we illustrate methods to learn the most probable sequence of clinical interventions that are associated with the laboratory observations, and to make predictions about patients’ impending conditions as a result of the interventions. This approach allows us to link patients’ biochemical responses with clinical interventions and with specific outcomes, thus providing a novel methodology for data-driven clinical pathway learning.

Clustering of Patients

To accommodate the heterogeneity in the patient population and improve model accuracy, we group patients according to similarity of their clinical history prior to pathway learning and prediction. We expect patients’ pathways to branch out as their health conditions and corresponding treatments evolve in different ways. Therefore, prior to pathway learning and prediction, we use hierarchical clustering to cluster patients’ pathways into subgroups according to longest common subsequence (LCS) distance measure.40 LCS is the longest subsequence that 2 sequences have in common, while preserving the order of occurrence of the items in the sequences, but items are possibly separated. LCS has been widely applied in biomedical research as a similarity measure used in trajectory analysis and protein sequence analysis.40 The distance measure, dLCS, is then computed as the difference between the sum of the lengths of 2 sequences and twice their LCS. (Details are in the eAppendix, available at Hence, dLCS is affected by the length of the identified subsequence, and the lengths of both sequences; for example, given the same length of LCS, dLCS is bigger for 2 long sequences than 2 short sequences. Therefore, clustering using dLCS allows us to group patients who not only share similarity in clinical interventions, but also have similar durations of treatment. The optimal number of clusters is determined using Silhouette, a measure commonly used in cluster analysis.41 In this study, we consider clusters that have 10 or fewer patients as outliers, and plan to evaluate rare events and exceptions in future research.


Copyright AJMC 2006-2020 Clinical Care Targeted Communications Group, LLC. All Rights Reserved.
Welcome the the new and improved, the premier managed market network. Tell us about yourself so that we can serve you better.
Sign Up