Currently Viewing:
The American Journal of Managed Care December 2015
Interest in Mental Health Care Among Patients Making eVisits
Steven M. Albert, PhD; Yll Agimi, PhD; and G. Daniel Martich, MD
The Impact of Electronic Health Records and Teamwork on Diabetes Care Quality
Ilana Graetz, PhD; Jie Huang, PhD; Richard Brand, PhD; Stephen M. Shortell, PhD, MPH, MBA; Thomas G. Rundall, PhD; Jim Bellows, PhD; John Hsu, MD, MBA, MSCE; Marc Jaffe, MD; and Mary E. Reed, DrPH
Health IT-Assisted Population-Based Preventive Cancer Screening: A Cost Analysis
Douglas E. Levy, PhD; Vidit N. Munshi, MA; Jeffrey M. Ashburner, PhD, MPH; Adrian H. Zai, MD, PhD, MPH; Richard W. Grant, MD, MPH; and Steven J. Atlas, MD, MPH
A Health Systems Improvement Research Agenda for AJMC's Next Decade
Dennis P. Scanlon, PhD, Associate Editor, The American Journal of Managed Care
An Introduction to the Health IT Issue
Jeffrey S. McCullough, PhD, Assistant Professor, University of Minnesota School of Public Health; Guest Editor-in-Chief for the health IT issue of The American Journal of Managed Care
Preventing Patient Absenteeism: Validation of a Predictive Overbooking Model
Mark Reid, PhD; Samuel Cohen, MD; Hank Wang, MD, MSHS; Aung Kaung, MD; Anish Patel, MD; Vartan Tashjian, BS; Demetrius L. Williams, Jr, MPA; Bibiana Martinez, MPH; and Brennan M.R. Spiegel, MD, MSHS
EHR Adoption Among Ambulatory Care Teams
Philip Wesley Barker, MS; and Dawn Marie Heisey-Grove, MPH
Impact of a National Specialty E-Consultation Implementation Project on Access
Susan Kirsh, MD, MPH; Evan Carey, MS; David C. Aron, MD, MS; Omar Cardenas, BS; Glenn Graham, MD, PhD; Rajiv Jain, MD; David H. Au, MD; Chin-Lin Tseng, DrPH; Heather Franklin, MPH; and P. Michael Ho, MD, PhD
E-Consult Implementation: Lessons Learned Using Consolidated Framework for Implementation Research
Leah M. Haverhals, MA; George Sayre, PsyD; Christian D. Helfrich, PhD, MPH; Catherine Battaglia, PhD, RN; David Aron, MD, MS; Lauren D. Stevenson, PhD; Susan Kirsh, MD, MPH; P. Michael Ho, MD, MPH; and Julie Lowery, PhD
Patient-Initiated E-mails to Providers: Associations With Out-of-Pocket Visit Costs, and Impact on Care-Seeking and Health
Mary Reed, DrPH; Ilana Graetz, PhD; Nancy Gordon, ScD; and Vicki Fung, PhD
Currently Reading
Innovations in Chronic Care Delivery Using Data-Driven Clinical Pathways
Yiye Zhang, MS; and Rema Padman, PhD
Characteristics of Residential Care Communities That Use Electronic Health Records
Eunice Park-Lee, PhD; Vincent Rome, MPH; and Christine Caffrey, PhD
Using Aggregated Pharmacy Claims to Identify Primary Nonadherence
Dominique Comer, PharmD, MS; Joseph Couto, PharmD, MBA; Ruth Aguiar, BA; Pan Wu, PhD; and Daniel Elliott, MD, MSCE
Physician Attitudes on Ease of Use of EHR Functionalities Related to Meaningful Use
Michael F. Furukawa, PhD; Jennifer King, PhD; and Vaishali Patel, PhD, MPH

Innovations in Chronic Care Delivery Using Data-Driven Clinical Pathways

Yiye Zhang, MS; and Rema Padman, PhD
This paper demonstrates that data-driven clinical pathways can be developed using electronic health record data to facilitate innovations in practice-based care delivery for chronic disease management.
Figure 2 illustrates our modeling scheme for learning the clinical pathways. Given the time stamps associated with intervention data recorded in the EHR, we assume that each state in the data-driven clinical pathway is separated by at least 1 time unit (eg, day, week, month), and that each state may contain more than 1 type of intervention. For example, it is typical for a CKD patient to have a follow-up visit in the clinician’s office, receive medication prescriptions, and have diagnostic codes assigned to the visit. Our data encoding anticipates such multidimensional and longitudinal features in the data. We assign a unique label for each unique combination of interventions occurring from a visit on the same day, such that patients’ clinical interventions that span multiple categories, such as diagnosis, medication prescription, and encounter type, can be transformed into 1-dimensional pathways, as shown in the top row in Figure 2. Naturally, these interventions are related to one another over time in varying degrees. For instance, interventions that occurred within 6 months of each other may be more strongly correlated than those that occurred within 2 years of each other.

In the context of CKD management, we assume that interventions at visit t+2 are dependent on activities at visit t+1 and t, as shown in the middle row in Figure 2. For analytical tractability, and reflecting actual practice in the management of many health conditions, the time intervals between 2 consecutive visits are categorized as: 1) less than 3 months, 2) greater than 3 but fewer than 6 months, or 3) at least 6 months. These assumptions are practice- and condition-specific,3 but can be readily modified for different settings. Patients’ biochemical conditions, as reflected by their laboratory observations, are assumed to be influenced by the interventions, as shown in the bottom row in Figure 2. For the problem of clinical pathway learning described in this study, our goal is to learn the most probable sequence of clinical interventions given to patients with a particular trajectory of biochemical responses. Similarly, the prediction problem is to infer the most probable imminent interventions in the next state—most importantly, diagnostic codes—for these patients.

We model this treatment process as a hidden Markov model (HMM). HMM is a statistical model with a wide range of applications, such as in speech recognition and RNA sequence analysis.42 It is defined by 5 elements: sequence of hidden states, sequence of observations, state transition probability distribution, observation probability distribution, and initial state distribution.43 HMM is used to represent a process in which a sequence of observations is generated, and each observation is triggered by an underlying process that is hidden to us. For example, given a sequence of a patient’s body temperatures, we may assume that the patient’s health condition is affecting his or her body temperature. Therefore, the sequence of body temperatures form the observations in HMM, and health conditions represent HMM’s sequence of hidden states.

The sequence of hidden states in an HMM has a first-order Markov property, which states that the current state only depends on the previous state.44 Therefore, we regard the middle row in Figure 2 as the sequence of hidden states and the bottom row as the sequence of observations. Parameters of the HMM, such as transition probabilities of hidden states in the Markov chain, are learned from the data using the expectation-maximization (EM) algorithm.43 Given HMM parameters, we can perform both the clinical pathway learning and prediction tasks through HMM decoding, which calculates the sequence of hidden states with the highest probability given the sequence of observations and the probability distribution of the model. Details of the model and algorithm are described further in the eAppendix and prior studies.39


Descriptive Statistics

We demonstrate the methodology using a real-world data set of 664 patients, with visits from 2009 to 2013 extracted from the EHR, who suffered from CKD and associated complications. The gender ratio is nearly equal. Over 67% of the patients are aged at least 70 years, and nearly 95% are Caucasian. Components considered as part of clinical pathways and the number of unique patients who had each component in their EHR are listed in Table 1. These components were selected for their relevance in CKD management, per consultation with clinicians, but can be extended to include additional details. All 664 patients had initial diagnoses of CKD stage 3 and hypertension, but not diabetes, and none of the patients had anemia or hyperparathyroidism initially. These patients either progressed to advanced CKD stages and ESRD, or improved to CKD stages 1 and 2. Most of them subsequently developed some of the complications listed in Table 1.

Clustering of Patients

The number of clusters, k, was determined to be 7 using the highest silhouette value (0.189) from hierarchical clustering. Table 2 describes the characteristics of each group in detail, indicating that hierarchical clustering using dLCS was able to divide patients into subgroups that differ on treatment frequency, duration, and outcome at the end of the study period. For example, 95% of the patients in subgroup 5 showed improvement in their conditions at the end of the study period, while none worsened, after being in the clinic for an average of 26.9 months. Subgroup 3 is the largest subgroup, and it also has the smallest average dLCS, suggesting that patients are more similar to one another compared with other subgroups. Subgroup 2, which needs to be investigated further, had a mixture, with 14% of patients who improved and 20% who worsened. The final column in Table 2 lists complications of CKD that the majority of patients suffer from in each group.

Clinical Pathway Learning and Prediction

Table 3 summarizes the accuracies associated with predicting the imminent interventions and diagnoses, such as prescription of diuretics and episodes of AKI, and learning the most probable pathways for sample subgroups 3, 4, and 5. We chose these 3 subgroups because of their larger subgroup sizes, and interesting final outcomes at the end of the study. We tested the accuracies using the most common sequence of laboratory observations (LOs) from 3 consecutive visits, and the number of patients who experienced such patterns is listed under the column, “Number of patients who had LOs.” Training and testing were performed through a variant of the leave-one-out cross-validation method.45 Learning and prediction were done with respect to the most common sequence of LO in each subgroup. It is interesting to note that the common biochemical patterns in subgroups 3 and 4 are the same, but the model identified different clinical pathways for these 2 groups, which require further examination. “Pathway with time”/“Pathway without time” measure accuracy of learning the entire pathway, including/not including the actual time duration between 2 visits, respectively. Similarly, “Future visit with time”/“Future visit without time” measure the prediction accuracy for patient’s future interventions, with/without time durations between visits. Each state variable contains information on the presence or absence of 3 encounter types, 19 diagnoses, and 4 drug classes, in addition to 3 different durations between visits. Therefore, the probability of accurate learning and prediction, on a random try, is extremely low compared with the results from our algorithm.

We also examined the false negative and false positive rates in the prediction of an imminent condition such as AKI. We define a false negative to be a case where patients’ CKD stages are worse than predicted, or patients developed AKI, which our methods failed to predict. A false positive is defined as patients’ CKD stages being better than predicted, or prediction of AKI when no AKI developed in reality. We include AKI in this analysis because it is a serious adverse outcome: it often requires hospitalization and can be fatal.46 We were able to obtain false-positive and false-negative rates that are as low as 0%, although this result needs to be validated using a much larger sample. Nevertheless, the learning and prediction algorithms show promise in identifying common pathways of treatments, but these need to be analyzed further to better delineate effective interventions in the various subgroups.


This paper provides a brief overview of machine learning approaches to assist medical decision making, and introduces a methodology, as well as an application that illustrates the development of data-driven clinical pathways through mining of EHR data. This approach may facilitate timely extraction of potential new evidence that could become the basis for new clinical trials, and may also serve as “shared baselines” to be used within a local practice for work flow and population health management.47 Patient-focused applications derived from our research, particularly those that visualize the clinical pathway and provide related patient-oriented recommendations and educational resources, may enhance patients’ understanding of their diseases and treatments, thus facilitating shared decision making.

An important ongoing study is to develop prediction models for other significant outcomes of interest in the management of CKD and its complications. Also, we need to evaluate these data-driven clinical pathways, especially their divergence and rare events, and their predictions with input from clinical professionals. As a growing number of healthcare organizations pilot new care delivery and payment models, such as the accountable care organizations,48  exploring disease trajectories that incorporate the interactions of clinical interventions and their associated outcomes may also provide useful insights on the cost effectiveness of treatments, which organizations can leverage for implementing innovative care delivery practices.

Copyright AJMC 2006-2019 Clinical Care Targeted Communications Group, LLC. All Rights Reserved.
Welcome the the new and improved, the premier managed market network. Tell us about yourself so that we can serve you better.
Sign Up