Currently Viewing:
The American Journal of Managed Care June 2014
Comparison Between Guideline-Preferred and Nonpreferred First-Line HIV Antiretroviral Therapy
Stephen S. Johnston, MA; Timothy Juday, PhD; Amanda M. Farr, MPH; Bong-Chul Chu, PhD; and Tony Hebden, PhD
The Value of Specialty Pharmaceuticals - A Systematic Review
Martin Zalesak, MD, PhD; Joyce S. Greenbaum, BA; Joshua T. Cohen, PhD; Fotios Kokkotos, PhD; Adam Lustig, MS; Peter J. Neumann, ScD; Daryl Pritchard, PhD; Jeffrey Stewart, BA; and Robert W. Dubois, MD
Health Insurance and Breast-Conserving Surgery With Radiation Treatment
Askal Ayalew Ali, MA; Hong Xiao, PhD; and Gebre-Egziabher Kiros, PhD
Patient-Centered Medical Home and Quality Measurement in Small Practices
Jason J. Wang, PhD; Chloe H. Winther, BA; Jisung Cha, PhD; Colleen M. McCullough, MPA; Amanda S. Parsons, MD, MBA; Jesse Singer, DO, MPH; and Sarah C. Shih, MPH
Impact of a Patient Incentive Program on Receipt of Preventive Care
Ateev Mehrotra, MD; Ruopeng An, PhD; Deepak N. Patel, MBBS; and Roland Sturm, PhD
Currently Reading
Novel Predictive Models for Metabolic Syndrome Risk: A "Big Data" Analytic Approach
Gregory B. Steinberg, MB, BCh; Bruce W. Church, PhD; Carol J. McCall, FSA, MAAA; Adam B. Scott, MBA; and Brian P. Kalis, MBA
Association of Electronic Health Records With Cost Savings in a National Sample
Abby Swanson Kazley, PhD; Annie N. Simpson, PhD; Kit N. Simpson, DPH; and Ron Teufel, MD
Learning About 30-Day Readmissions From Patients With Repeated Hospitalizations
Jeanne T. Black, PhD, MBA
Removing a Constraint on Hospital Utilization: A Natural Experiment in Maryland
Noah S. Kalman, MD; Bradley G. Hammill, MS; Robert B. Murray, MA, MBA; and Kevin A. Schulman, MD
Using Clinically Nuanced Cost Sharing to Enhance Consumer Access to Specialty Medications
Jason Buxbaum, MHSA; Jonas de Souza, MD; and A. Mark Fendrick, MD
Real-World Impact of Comparative Effectiveness Research Findings on Clinical Practice
Teresa B. Gibson, PhD; Emily D. Ehrlich, MPH; Jennifer Graff, PharmD; Robert Dubois, MD; Amanda M. Farr, MPH; Michael Chernew, PhD; and A. Mark Fendrick, MD
A Systematic Review of Value-Based Insurance Design in Chronic Diseases
Karen L. Tang, MD; Lianne Barnieh, PhD; Bikaramjit Mann, MD; Fiona Clement, PhD; David J.T. Campbell, MD, MSc; Brenda R. Hemmelgarn, MD, PhD; Marcello Tonelli, MD, SM; Diane Lorenzetti, MLS; and Brade

Novel Predictive Models for Metabolic Syndrome Risk: A "Big Data" Analytic Approach

Gregory B. Steinberg, MB, BCh; Bruce W. Church, PhD; Carol J. McCall, FSA, MAAA; Adam B. Scott, MBA; and Brian P. Kalis, MBA
The authors evaluated a new "big data" analytic predictive platform that quickly and accurately analyzes large data sets to identify populations at risk of developing conditions such as metabolic syndrome.

We applied a proprietary “big data” analytic platform—Reverse Engineering and Forward Simulation (REFS)—to dimensions of metabolic syndrome extracted from a large data set compiled from Aetna’s databases for 1 large national customer. Our goals were to accurately predict subsequent risk of metabolic syndrome and its various factors on both a population and individual level.

Study Design

The study data set included demographic, medical claim, pharmacy claim, laboratory test, and biometric screening results for 36,944 individuals. The platform reverse-engineered functional models of systems from diverse and large data sources and provided a simulation framework for insight generation.


The platform interrogated data sets from the results of 2 Comprehensive Metabolic Syndrome Screenings (CMSSs) as well as complete coverage records; complete data from medical claims, pharmacy claims, and lab results for 2010 and 2011; and responses to health risk assessment questions.


The platform predicted subsequent risk of metabolic syndrome, both overall and by risk factor, on population and individual levels, with ROC/AUC varying from 0.80 to 0.88. We demonstrated that improving waist circumference and blood glucose yielded the largest benefits on subsequent risk and medical costs. We also showed that adherence to prescribed medications and, particularly, adherence to routine scheduled outpatient doctor visits, reduced subsequent risk.


The platform generated individualized insights using available heterogeneous data within 3 months. The accuracy and short speed to insight with this type of analytic platform allowed Aetna to develop targeted cost-effective care management programs for individuals with or at risk for metabolic syndrome.

Am J Manag Care. 2014;20(6):e221-e228
  • Health insurance companies have large quantities of data relevant to predicting onset of conditions such as metabolic syndrome, including demographic, diagnosis and procedure claim data, lab results, and prescription and care management program data.
  • The platform allows users to interrogate such large, complex data sets and generate meaningful insights within months about individuals and populations at risk, and for a fraction of the cost of clinical trials and traditional analysis.
  • The speed-to-insight possible with this new approach allowed Aetna to design and launch customized interventions to improve health outcomes of the affected population and start quantifying returns on its program investment.
The growing prevalence of metabolic syndrome in the United States, and globally, is alarming. Metabolic syndrome is generally defined as having three or more of five common biological abnormalities out of range: waist circumference, blood pressure, elevated triglycerides, low high density lipoproteins (HDL),and increased insulin resistance. Analysis1 suggests that almost one-third of US adults, or approximately 80 million people, meet the Adult Treatment Panel III criteria for metabolic syndrome, with prevalence increasing significantly with age and body weight.2 An additional 45%, or approximately 104 million people, have 1 or 2 risk factors for developing metabolic syndrome.

These trends have profound clinical and financial implications. Individuals with metabolic syndrome are twice as likely to develop cardiovascular disease and 5 times more likely to develop diabetes mellitus, both of which mean higher than average annual healthcare costs. Workplace participation and productivity of individuals with metabolic syndrome are also negatively impacted.3

Health insurance companies have large quantities of data relevant to metabolic syndrome, including demographic data, diagnosis and procedure claim data, lab results, prescription data, and care management program data. Using “big data analytics” to interrogate large, complex data sets can generate meaningful insights about individuals with or at risk of developing metabolic syndrome.

We applied a proprietary “big data” analytic platform— Reverse Engineering and Forward Simulation (REFS)—to the data set of 1 of Aetna’s larger nationwide retail customers and calculated:

  • The subsequent risk of metabolic syndrome, both overall and by metabolic syndrome risk factor, at both a population and individual level
  • The impact of incremental changes in risk factors on the overall subsequent risk of metabolic syndrome and on costs
  • The impact of adherence to medications and to routine, scheduled outpatient doctor visits on the subsequent risk of metabolic syndrome.
Big data analytic techniques of this type rapidly yiled insights that support data-driven targeted interventions for people with or at risk of developing metabolic syndrome. Aetna is currently piloting an intervention program based upon the results.


The REFS platform is best used to analyze and simulate large, dynamic, multisource data sets. The platform learns by reverse engineering ensembles of models that represent the diversity of processes consistent with the data and then simulating nonparametric knowledge representations to generate accurate, granular group and individual predictions that are both actionable and generalizable. Accurate insights from available data can be generated within a few months, and new data easily integrated. The speed-to-insight allows care providers to develop effective therapeutic programs and interventions quickly and cost-effectively, ultimately lowering the cost to serve the affected populations.

Data Sources

Data for this study were gathered from:

  • Insurance eligibility records
  • Medical claims records
  • Pharmacy claims records
  • Comprehensive Metabolic Syndrome Screening (CMSS) results
  • Laboratory test results
  • Health risk assessment (HRA) responses

Study Population

The CMSS results provided the core outcome variables for the study, and measured each of the 5 metabolic syndrome factors (including systolic and diastolic blood pressure). Screenings were conducted twice: once at the beginning of 2011 and again in early 2012, for an initial cohort of 59,605 people. We then restricted the study to participants for whom we had: complete coverage records from January 1, 2010, through December 31, 2011; complete data from medical claims, pharmacy claims, or test lab results for 2010 and 2011; and valid responses to a small set of HRA questions. This resulted in a study population of 36,944, which was then randomly assigned to either an 80% training set (N = 29,527) or a 20% test set (N = 7417). The study population metabolic syndrome risk and medical cost profile is found in Figure 1. Additional demographic detail is found in eAppendix Figure 1.

Variable Creation and Definitions

The 4291 variables in the analysis spanned 6 different data categories. The specific breakdown of data categories is found in eAppendix Table 1. Continuous variables were discretized into ranges in preparation for modeling with multivariate categorical models. The ranges of the CMSS factors were constructed from metabolic syndrome out-ofrange boundaries and other clinically relevant boundaries.

Demographics captured 5 dimensions in addition to gender: age, body mass index (BMI), ethnicity, cigarette usage, and sleep. In addition, 4 event types were defined from claims: diagnoses, procedures, provider specialty, and prescriptions. Further detail regarding demographics and events is found in eAppendix Figure 1. An indicator variable identified the year in which an event occurred.

1. Lab results. Results from 24 common lab tests (as identified by Logical Observation Identifiers Names and Codes number) were extracted for each year. Results were discretized in up to 7 ranges.

2. Biometrics. For each of the CMSS biometric screenings conducted, 6 variables were created (the 4 single-metric metabolic syndrome factors and systolic and diastolic blood pressure values). The values were then segregated into 7 ranges for blood pressure and 6 ranges for the remaining CMSS factors. In cases where the biometric corresponded to a lab test, the same discretization was used.

3. Medication adherence. We calculated a subject’s medication possession ratio (MPR) for 4 classes of medication: antidiabetics, antihyperlipidemics, antihypertensives, and other cardiovascular medications. More detailed information on MPR calculus is found in eAppendix Table 1. An MPR of 80% or higher was considered adherent.4 For each year and each category of medication, a subject was categorized as: N/A (no prescriptions of that type), once and done (1 prescription of that type), not adherent, or adherent.

4. Preventive visits. A subject was deemed to have had a preventive visit if they had at least 1 claim during each year coded as a Preventive Visit (with one of 26 specific Evaluation & Measurement CPT-4 codes).

Statistical Methods: Platform Analytic Methods and Simulations

The REFS platform learns by Metropolis Monte Carlo5 sampling from the posterior of the model-structure distribution. Model structure probabilities are computed in a Bayesian framework by marginalizing out the unknown parameter distributions against the observed data and maximum entropy parameter priors.6 These model structure probabilities balance the model’s fit of the data against the model’s complexity.

Once learned, the model was interrogated by Forward Simulation (FS) to learn risk factors as well as the impact of interventions for individuals and populations. FS is a fast Monte Carlo process that samples simultaneously from the structure of the platform, the uncertainty in its parameters, and residual uncertainty on the outcomes that is efficient enough to be driven interactively. Multivariate categorical models were sampled describing each of the 6 discretized metabolic syndrome components. These models included up to 16 variables chosen from the total set of all variables. For each of the 6 metabolic syndrome components, the size of the space of models sampled during reverse engineering (RE) is the number of ways to choose up to 16 distinct variables from the 4291 variables possible or approximately 1044 models. Metropolis Monte Carlo can efficiently sample from these astronomically large hypothesis spaces guided only by the data even without prior knowledge to guide the search.

Two models were learned. The Metabolic Syndrome Status Model was trained on claims-based events from 2010 to predict the CMSS measurements taken at the beginning of 2011 and the Metabolic Syndrome Velocity Model used claims-based events from 2011 together with the 2011 CMSS measurements to predict 2012 CMSS results.

Simulations Conducted

Because the number of model parameters is much larger than the number of observations, there were many models consistent with the observed data. The ensemble of models learned in the reverse engineering phase is a population sample from the posterior distribution over model structures.

Individual risk simulations. Forward simulations were computed for each of the 5 primary metabolic syndrome factors (with blood pressure separated into systolic and diastolic components for all study subjects) to predict likely values of metabolic syndrome factors at the next biometrics screening. The output for each factor was the probability of each range of the discretization of the factor. The probabilities across the outcome ranges were aggregated on either side of the factors out-of-range boundary and the resultant out-of-range probability computed for each factor. The individual out-of-range probabilities were further aggregated to compute the probability of metabolic syndrome.

Metabolic syndrome factor incremental perturbation simulations. To understand the impact of incremental changes in metabolic syndrome components on overall probability of metabolic syndrome for each individual, we simulated an incremental change in each metabolic syndrome component (a single range shift upward or downward for the component). From the 12 outputs, we recorded which incremental perturbation led to the largest increase and decrease in probability of metabolic syndrome for that patient, along with the magnitudes of the change in probability of metabolic syndrome.

Medication adherence and preventive visits simulations. The impact of medication adherence and preventive visits was assessed by counterfactual simulations of patients who were nonadherent in 1 or more of the drug-specific adherence metrics and patients who were noncompliant with preventive visits. For each patient the nonadherent metrics were switched to adherent and the patient-specific change in probability of metabolic syndrome was recorded. A similar simulation was applied to preventive visits.


Copyright AJMC 2006-2017 Clinical Care Targeted Communications Group, LLC. All Rights Reserved.
Welcome the the new and improved, the premier managed market network. Tell us about yourself so that we can serve you better.
Sign Up

Sign In

Not a member? Sign up now!