Using Sequence Discovery to Target Outreach for Diabetes Medication Adherence

April Lopez, MS; Charron Long, PharmD; Laura E. Happe, PharmD, MPH; and Michael Relish, MS

Nonadherence to prescription medication accounts for an estimated $68 to $146 billion in avoidable medical costs annually and is associated with adverse clinical outcomes and mortality.1-7 Given the magnitude of the problem, nonadherence is perhaps one of the most widely studied topics in medication. A simple search of the US National Library of Medicine (PubMed) for “medication adherence” or “medication compliance” yields more than 20,000 articles.
Medication adherence can be measured in a variety of ways, including pill counts, patient reports, and pharmacy claims data. While researchers continue to search for a “gold standard” measurement of adherence, a multitude of predictors of medication nonadherence have been documented, including medication regimen complexity, multiple comorbidities, prescription cost, forgetfulness, depression, lack of patient understanding and engagement, and poor relationship with providers.8,9
Despite research efforts and numerous interventions, patients continue to struggle with adhering to prescribed medication regimens. In 2003, the World Health Organization reported medication adherence of only 50% in developed countries, with more recent data supporting this disappointing trend.5,8,10,11 Because nonadherence persists even after traditional epidemiological studies have identified multiple associated factors, it seems likely that other, yet unknown, factors may be related to poor medication nonadherence.
Data mining techniques have been used for decades in other industries to uncover correlations and understand patterns in large relational data sets.12-15 In recent years, these techniques have been applied to healthcare and biomedical data to identify unknown relationships between variables, generate new hypotheses, and support decision making.16-21  Different from traditional, hypothesis-driven approaches, data mining identifies correlations without consideration of prior knowledge and explores the effects of multiple combinations of exposures on outcomes.
Accordingly, we sought to investigate exposures and exposure sequences that are correlated with nonadherence, defined as a gap in prescription claims with diabetes medications—a therapeutic area with documented suboptimal adherence.22-27 Using the diverse data sets available at Humana Inc, a national healthcare company, we used association rule data mining and sequence discovery techniques to identify exposures from administrative, customer service, and consumer data. By understanding common exposure sequences indicative of nonadherence with diabetes medications, Humana and other healthcare companies can develop appropriately timed, patient-specific outreach aimed at improving adherence and patient outcomes.
Data Sources
The administrative claims, enrollment, customer service, and consumer data used for this study were collected from Humana Inc, which insures over 2.3 million Medicare Advantage members and 1.6 million commercial members (at the time of this analysis).28 Administrative claims data contained adjudication information for prescription medications, including drug name, dosage, quantity, days’ supply, and date of fill; International Classification of Diseases Ninth Revision, Clinical Modification (ICD-9-CM) codes for all inpatient and outpatient encounters; demographics; and coverage start and end dates. Enrollment data included address, plan enrollment, and premium information. Customer service data included detailed communications to and from health plan members, including inbound and outbound calls, e-mails, faxes, in-person contacts, and Web-based interactions. Consumer data included third party–compiled census data, buyer behavior, demographic information, proprietary models, and segmentation data. Data were analyzed from individuals with medical coverage and at least 1 prescription for a diabetes medication (ie, biguanides, dipeptidyl-peptidase-4 inhibitors, glucagon-like-peptide 1 agonist, meglitinides, sulfonylureas, and thiazolidinediones, alone or in combination) with continuous refill history between August 1, 2011, and February 1, 2012.
The outcome of interest was a gap in diabetes medication therapy, defined as a prescription refill obtained 6 or more days after exhaustion of the days’ supply of the previous refill. Gaps were identified between February 1, 2012, and March 31, 2013. Patients were divided into 2 groups: those with and those without a gap in prescription refill history of diabetes medications. In each group, exposures were identified during the 90 days before the index date, which was defined as the actual next refill date in the no-gap group or the expected refill date in the gap group (eAppendix, available at For the 90-day pre-index period, association rule mining and sequence discovery techniques were used to identify exposure sequences associated with a refill gap.
A sub-analysis within the gap group evaluated exposures during both a gap and no-gap period in the same patient. Exposure evaluation began on the actual (no-gap period) or expected (gap period) refill date. Refills with a 30-day supply had a 29-day look back, while refills with a 90-day supply had an 89-day look back to eliminate duplication of data between the gap and no-gap periods.
Exposure variables were identified based on availability in the databases, as well as the ability of Humana to provide outreach given the presence of a certain variable. Exposures related to an individual’s medical care included annual physical exam, hospitalization, emergency department (ED) visit, specialty physician visit, newly diagnosed condition, bariatric surgery, enrollment in a bariatric surgery program, and enrollment in a smoking cessation program. Prescription medication–related exposures included new medications, reversal of a claim for a prescription drug, adverse drug events, comprehensive medication reviews, prescription for smoking cessation product, change in mail order versus retail delivery channel, and mail order educational contacts. Personal exposures included address and religious affiliation changes, death of a family member in the same household, and natural disasters. Several communication-related exposures were also assessed, including inbound and outbound faxes, calls, e-mails, Web communication, and walk-in contacts. Insurance-related exposures included out-of-network claims, disruptions to plan coverage, entrance into the Part D coverage gap, and entrance into catastrophic prescription coverage.
Association Rule Mining and Sequence Discovery
Association rule mining is an efficient way to identify associations between variables in large data sets and determine the likelihood of variables occurring together. These rules count the number of times items occur in the data set, either alone or in combination, and differ from regression modeling, which assesses the independent strength of 2 variables, holding all others constant. An association rule is depicted as E  O, where E represents exposure variables and O is the outcome variable. Association rule mining evaluates each association according to a minimum support, confidence, and lift level. "Support" for the rule represents the probability that both variables occur together, while "confidence" represents the conditional probability of the outcome occurring, given the exposure. "Lift" is defined as measured confidence divided by expected confidence level.

Support (EO ) = P(EO)
Confidence (EO) = P(O/E) = P(EO) / P(E)
Lift (EO) = Confidence (EO) / Expected Confidence (EO)

Credible associations have a high support and confidence (both expressed as a percentage), with a lift level greater than 1. Sequence discovery utilizes association rule mining results and accounts for the timing of the relationship among items; for example, rule A  B implies that event B occurred after event A occurred.29
We used sequence discovery to identify exposure sequences in the gap and no-gap groups and those common to both groups. The sequence discovery analysis built upon association rule mining analyses (using a minimum support of 5%) and was set to a minimum support threshold of 2%. The maximum length of sequences was limited to 7, and only patients with more than 1 exposure contributed to sequence discovery analysis. Sequences with the highest support and confidence for each sequence length were determined for each of the 3 groups (gap, no-gap, both). In addition to exposures sequences, singular exposures with the highest frequencies in the gap group and no-gap group were reported. Exposure sequences and singular exposure frequencies were also described for the gap group sub-analysis.
Using exposure sequences generated by sequence discovery analysis, we sought to identify exposures and exposure sequences that would be most practical for a healthcare company to target outreach aimed at improving adherence and patient outcomes. First, we identified the most frequently occurring initial or final exposures in gap group sequences. Initial exposures in sequences are important since they can be used as early events to trigger an intervention; final events provide a potential last opportunity for intervention. From those, we identified exposure sequences offering the greatest opportunity for intervention, using the following criteria: 1) sufficient sequence length allowing time for intervention; 2) higher support, combined with a high confidence level, compared with other sequences within the same sequence length group; 3) a variety of exposures in a sequence, supporting multiple intervention points; and 4) combinations of the same exposures, albeit in different order, that were more prevalent in the gap group.
These analyses were conducted as a part of Humana’s ongoing administrative activities aimed at improving medication adherence—not to generate scientific knowledge. Such quality improvement activities, which do not meet the regulatory definition of research under 45 CFR 46.102(d), do not require review by an institutional review board.30 Humana’s privacy and ethics board did review and approve this work, however. All analyses were run with SAS Enterprise Guide version 5.1 (SAS Institute, Cary, North Carolina).29
Overall, 124,741 individuals were evaluated and 5448 were excluded because they did not have one of the defined exposures of interest. Among the 119,293 patients included in the analysis, 89,820 (75%) had a gap in diabetes medication therapy and 29,473 (25%) did not (Figure 1). The population was 50% female, with a mean age of 70 years and average Charlson comorbidity index score of 3.2 (Table 1). Nearly half of the population resided in the south (47%) and the majority of index prescriptions were for biguanides (53%) and sulfonylureas (37%). The majority of prescriptions were filled for a 90-day supply, with the remainder filled for 30 days.
Sequence Discovery
Overall, 602 exposure sequences were identified in the gap group and 271 in the no-gap group, with 1069 shared between both groups (Figure 1). As shown in Table 2, exposure sequences with the highest support and confidence for the gap group included inpatient hospital stays in 3 of 5 exposure sequences; sequences of 4, 5, or 6 exposure lengths all had multiple hospitalizations or hospitalizations with multiple inpatient days. Hospitalizations were not present in sequences for the no-gap group (data not shown). 
Outbound voice-activated technology (VAT) calls and natural disasters were more common singular exposures in the 90-day lookback period studied for the gap group compared with the same lookback period for the no-gap group, with a between-group difference of 18.5% and 6.9%, respectively (data not shown). Exposures more common in the no-gap group included annual physical exams, outbound calls from the mail order pharmacy, and reversed prescription claims, with frequencies 2.1%, 2.3%, and 3.0% higher, respectively, than the gap group (data not shown).
Exposures and Sequences for Targeted Outreach
The most frequent initial and final exposures in sequences for the gap group were: 1) specialty care physician visit, 2) new prescription, 3) out-of-network service claim, 4) hospitalization, 5) outbound VAT call, and 6) prescription claim reversal (Figure 2). Based on the criteria for targeting outreach, 3 sequences for possible interventions were identified (Table 3). Outreach opportunities identified from these sequences included individuals taking diabetes medications who are prescribed a new medication—especially those who have multiple out-of-network claims and/or visit a specialty physician after the new medication is prescribed. Those taking diabetes medications who have a prescription claim reversed should receive an outreach—especially if they subsequently are prescribed a new medication or visit a specialty physician. Finally, individuals taking diabetes medications who have multiple out-of-network claims should receive an outreach—specifically those who also have a hospitalization.
Gap Group Sub-Analysis: Evaluation of the Gap and No-Gap Period in the Same Patient
Comparing the gap versus no-gap periods in the same patients, out-of-network claims were noted in nearly all top sequences for the gap period. Out-of-network claims, natural disaster in the area, and inpatient hospital stays were more frequent singular exposures in the gap period. Exposures identified in the gap period of the sub-analysis (ie, out-of-network claims and inpatient hospital stays) were also among the top exposures identified in the primary gap period analysis. Outbound VAT calls, change in prescription delivery channel, inbound mail order pharmacy calls, and being prescribed a new medication were more frequent during the no-gap period (data not shown).
Contrary to traditional epidemiologic methods, which quantify associations between outcomes and postulated exposures, this study used data mining techniques to explore associations between diabetes medication adherence and a large set of exposures without regard to whether the exposures had a hypothesized relationship with the outcome. Since no previous studies have applied this approach in medication adherence, this work provides unique insights into several exposures, which have not previously been investigated. To illustrate this point, the top exposures identified in this study were contrasted with 18 exposures reported (regardless of whether an association was found) in a 2014 systematic literature review of 27 studies evaluating factors associated with adherence to diabetes medications, and there was no overlap.27 The literature review reported great variability in factors predictive of nonadherence, highlighting the importance of a new approach to evaluating this topic.
Importantly, this study went beyond simply identifying associations to investigating sequences of exposures, upon which a health plan can intervene to potentially prevent gaps in prescription refills before they occur. Since health plans have unique access to more data about a given patient and their medical providers than any other part of the system, the insights provided by this study can assist plans in intervening to positively affect diabetes medication adherence. Health plans can use automated systems to create electronic alerts when an event or series of events—that can only be seen at the health plan level—occur, as defined in Table 3. Those alerts can trigger a variety of actions; for example, they may alert care managers to incorporate actions into a patient’s care management plan to address the potential for nonadherence or prompt a medication therapy management pharmacist to contact the patient for a medication consultation. A variety of automated interventions could also be generated. Humana is utilizing the information from this study to inform intervention strategies, and future research should quantify the effectiveness of acting on the events identified in this work.
It is important to emphasize both the clinical and economic relevance of efforts to improve adherence to diabetes medications. It is well documented that poor medication adherence in diabetes is associated with increased hospitalizations and ED visits,31-34 which are often manifestations of poor glycemic control. Illustrating the relationship between glycemic control and medication adherence, a 1-year study by Kaiser Permanente of 1560 patients with type 2 diabetes reported that glycated hemoglobin was reduced by 0.34% for every 25% increase in medication adherence (P = .0009).32  This same Kaiser study found higher all-cause mortality in nonadherent patients compared with their adherent counterparts. Worsening health outcomes are almost ubiquitously accompanied by increasing costs, as is the case in nonadherence. Reports have suggested that eliminating poor adherence to insulin and oral medicines would generate over $13,000 in savings, on average, to each newly diagnosed patient, or $10.7 billion in aggregate.35 Finally, CMS recognizes the importance of adherence to oral diabetes medications, as it is among the patient safety outcomes measures for the CMS Plan Quality and Performance Program, or Stars Rating program.36 A 2014 study by Medicare Advantage Part D pharmacy benefit manager, MedImpact, reported a positive impact of a coordinated, member-directed medication adherence intervention program on adherence and star rating adherence measures.37
Although this study provides novel and actionable insights for health plans to potentially improve adherence to diabetes medications, there are limitations to the work. This study was conducted within a single health plan population, which has members in all 50 states, but is highly concentrated in southern regions. The study relied upon data inputs available within this health plan; therefore, the generalizability to other populations or health plans without the same data elements available may be limited. This study also exclusively evaluated oral antidiabetic medications; future work should evaluate other therapeutic classes to determine if these findings can be applied to a broader array of chronic conditions. The data mining technique applied in this study is subject to the risk of finding spurious associations, but risk was limited by the use of statistically sound association techniques. This technique is exploratory in nature; the criteria for selecting exposure sequences for subsequent intervention were subjective and may not be practical in other settings. As with all studies which rely on retrospective review of electronic data captured for other purposes, there may have been unmeasured exposures, coding errors, or missing data; however, given the size of the data set, the impact of the latter would be minimal.
Medication adherence is a persisting challenge that has substantial clinical and economic consequences; yet, traditional epidemiologic methods and interventions have had limited ability to influence adherence at a population level. This novel application of sequence discovery techniques identified unique sequences of events with opportunities for health plan outreach. The health plan’s unique access to the breadth of data, coupled with the novel sequences of events identified as precursors to gaps in therapy in this study, present a promising new approach to preventing nonadherence.
The authors would like to thank Victor Lawnicki, Shane Rathbun, and Peinie Young for their contributions to this research. 
Print | AJMC Printing...