• Center on Health Equity and Access
  • Clinical
  • Health Care Cost
  • Health Care Delivery
  • Insurance
  • Policy
  • Technology
  • Value-Based Care

The Use of Claims Data Algorithms to Recruit Eligible Participants Into Clinical Trials

The American Journal of Managed CareFebruary 2015
Volume 21
Issue 2

Using an ICD-9-CM code algorithm, the authors effectively identified potentially difficult-to-reach populations for a hypertension clinical trial.


Recruitment strategies usually focused on a single International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) code and rarely included exclusion criteria. The purpose of this study was to validate a claims-based algorithm to identify, from Veterans Affairs administrative data, eligible participants to be recruited into a hypertension trial.

Study Design



Subjects were labeled as eligible if they were 75 years or older, had a hypertension ICD-9-CM code (401.x-405.x, 437.2) and did not have a diabetes (250.xx) or stroke (430.x-436.x, 437.1, 437.9, 438.x) ICD-9-CM code. We compared the eligible subjects with the medical record—which was considered the gold standard—and we calculated the positive predictive value (PPV) of identifying a subject in the medical record.


The algorithm identified 3591 elderly veterans with hypertension with no diabetes or stroke, and we reviewed the medical records of 76 randomly selected patients. In the sample of medical record review, the mean age in years was 83 ± 5.3, 48% had coronary artery disease, and the mean systolic blood pressure was 134 mm Hg ± 15.5. When compared with the medical record, the PPV for any hypertension code was 93% (95% CI, 85%-98%), and for the entire algorithm, including 75 years or older and the absence of both diabetes and stroke, the PPV was 83% (95% CI, 73%-91%).


Am J Manag Care. 2015;21(2):e114-e118

The use of any ICD-9-CM code for hypertension is useful to identify elderly patients with hypertension. The algorithm to identify elderly patients with hypertension and without diabetes or stroke is a useful tool to also identify eligible patients for clinical trial participation.

  • The inclusion/exclusion criteria of a clinical trial can be translated into an International Classification of Diseases, Ninth Revision, Clinical Modification algorithm.
  • A single hypertension code identifies hypertensive patients in 93% of cases.
  • An administrative algorithm that includes age, hypertension, and lack of diabetes or stroke is valid compared with a medical record

Hypertension is rising in prevalence and contributes strongly to cardiovascular morbidity and mortality—consequences that can be reduced with appropriate treatment.1 Hypertension clinical trials test new therapies or new strategies for the treatment of hypertension; however, recruitment—especially the recruitment of minorities and older subjects—is always a costly challenge for investigators. The PREMIER: Lifestyle Interventions for Blood Pressure Control study found that a successful source of recruitment into hypertension trials has been mass mailings.2 Mailings usually use patients who are identified using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes as a source list.

Large observational databases capture data using ICD- 9-CM codes and are used for quality assessment, utilization management, reimbursement processes, health services, and outcomes research. However, concerns have been raised about the accuracy of the ICD-9-CM codes.3,4 Previous studies have noted that these codes have varying rates of chronic disease identification in hospital settings, ranging from a sensitivity of 35% for stroke, to 82% for the diagnosis of both type 2 diabetes mellitus and hypertension.5,6

The studies reporting on the validity of hypertension codes are limited by the fact that a single code is used and do not take into account inclusion and exclusion criteria usually used in clinical trials. Therefore, we evaluated the validity of a set of codes that take into account the inclusion and exclusion criteria to recruit a diverse group of elderly hypertensive patients without diabetes or stroke.


Study Design and Study Population

The cross-sectional study was conducted within a multicenter hypertension trial. The trial was a 2-arm, multicenter, randomized clinical trial designed to test whether a treatment program aimed at reducing systolic blood pressure (SBP) to a lower goal than currently recommended would reduce cardiovascular disease (CVD) risk. About 9250 participants with SBP >130 mm Hg and at least 1 additional CVD risk factor would be recruited at approximately 90 clinics over a 2-year period and will be followed for 4 to 6 years. Approximately 4300 participants are expected to have chronic kidney disease, and 3250 will be 75 years and older. The primary outcome of the study is the first occurrence of a myocardial infarction, acute coronary syndrome, stroke, or heart failure; or CVD death.

The Miami Veterans Affairs Medical Center participated as a research site. The specific population we were targeting was potential hypertensive research participants who were 75 years or older without diabetes or stroke. Using ICD-9-CM codes, we identified 3591 subjects who had been seen in a primary care clinic between January 1, 2011, and January 1, 2012.

Definition of the Algorithm

We used the hypertension ICD-9-CM codes 401.x, 402.x, 403.x, 404.x, 405.x, and 437.2 for inclusion as a potentially eligible subject, and these ICD-9-CM codes for exclusion: diabetes (250.x) and stroke (430.x, 431.x, 433.x, 434.x, 435.x, 436.x, 437.1, 437.9, 438.x). We considered ICD-9-CM codes in any diagnostic position up to 9 positions. The algorithm was based on previously published validation studies of CVD risk factors.5,6

Gold Standard

Our gold standard was chart abstraction of 76 randomly selected charts. We selected 76 because we randomly selected 1 chart of every 47 out of the 3591 records identified by the algorithm. Two investigators (LT, JD) conducted the data abstraction using a data abstraction tool and calculated inter-rater variability. We defined hypertension if hypertension was listed as a problem in the assessment and plan of at least 1 primary care clinical note after the date the ICD-9-CM code was listed, and if they were taking antihypertensive medication. We did not consider a subject as hypertensive if they were only using blood pressure medications because their use could have been explained by having heart failure or atrial fibrillation. We defined diabetes using a modified version of the American Diabetes Association criteria, which includes documentation of diabetes in the assessment and plan of at least 1 clinical note, or the use of insulin or oral antidiabetics, or laboratory documentation with fasting plasma glucose ≥126 mg/dL or glycated hemoglobin ≥6.5%.7 We defined stroke as the documentation of stroke in a clinical note in the assessment, or a CT scan showing old cerebral infarction. The inter-rater reliability analysis was performed for the gold standard variables using all records. The median inter-rater reliability for chart abstractions on these elements was 97.9% (interquartile range of 93.8%-99.1%).

Statistical Analysis

We compared the ICD-9-CM codes with the abstracted data. We first calculated the positive predictive value (PPV) of the ICD-9-CM codes for hypertension compared with the hypertension gold standard. We then calculated the PPV of the entire algorithm compared with the chart abstraction. We used STATA 12 (Stata Corp LP, College Station, Texas) to calculate the PPV and the corresponding 95% CI.


Baseline Characteristics

Table 1

reports the baseline characteristics of the entire population. Our population consisted of mostly older white male veterans, of whom 48% had coronary artery disease and 11% had atrial fibrillation. Table 2 reports the blood pressure parameters of our sample. Both systolic and diastolic blood pressures were well controlled using a mean number of antihypertensives of 1.61 ± 0.83. The most commonly used antihypertensives were angiotensin-converting-enzyme inhibitors and calcium channel blockers.

Description of the Process

Table 3

After obtaining institutional review approval, we searched PubMed and identified validated ICD-9-CM codes for hypertension, diabetes, and stroke with the highest PPV and sensitivity to create our algorithm. A description of the included codes is provided in . The algorithm, with an explanation of the inclusion and exclusion criteria, was provided to an information technology specialist with access to the Veterans Affairs corporate data warehouse database. The data warehouse contains inpatient and outpatient healthcare utilization information in a Microsoft sequel server.8 A list of names and medical records were provided by the information technology specialist to the research team. This list was used to access the medical records, validate the algorithm, and recruit subjects into the clinical trial.

Validity of Hypertension Codes

The most commonly used code was 401.9, used in 95% of the population. Only 5 subjects had an ICD-9- CM code for hypertension but had no medical record definition of hypertension. Of those 5 subjects, 1 was identified using code 401.1; they had a mean SBP of 123 mm Hg, and 40% had coronary artery disease, atrial fibrillation, or heart failure. When compared with the gold standard definition of hypertension, the PPV of any hypertension code was 93% (95% CI, 85%-98%).

Validity of the Algorithm

The algorithm correctly identified 63 subjects as being eligible for the clinical trial. The 13 remaining were classified as incorrectly identified and included the 5 subjects who did not have a true hypertension diagnosis. The remaining subjects used oral antidiabetics (n = 1), had a glycated hemoglobin (A1C) of 6.5 or greater (n = 1), had a serum glucose of 126 or more (n = 5), or had a diagnosis of diabetes (n = 1). None of the charts evaluated had stroke as a diagnosis.

The mean serum glucose for the entire population was 99 ± 14.5 mg/dL, and the mean A1C was 5.6 ± 0.60. The PPV for the algorithm was 83% (95% CI, 73%-91%).


Our results show that a single ICD-9-CM code (401.9) can correctly identify elderly veterans with hypertension. Our results also show that a combination of codes that include inclusion/exclusion criteria to target a specific population is a potentially useful recruitment tool. Our conclusions are supported by the fact that we used an extensive clinical definition of the gold standard as well as our low inter-rater variability.


First, because we used an elderly population that has frequent clinical encounters, the opportunities for having documentation of the diagnoses of interest during the window period studied were greater than in a younger cohort; this could have influenced the PPV. Nevertheless, our report is accurate for elderly subjects. Second, we did not validate the race present in the medical record since this was a retrospective review and we had no gold standard. However, others have validated the use of claims-based algorithms to identify racial minorities that could be invited to participate in clinical trials.9 Of note, during the time in which our site was participating in the multicenter trial using this recruitment algorithm, we successfully recruited 78 patients, of whom 87% were minorities. Third, the generalizability of this process and results are mostly applicable to facilities that have the capability to query data and have advanced electronic medical records, and to facilities that use ICD-9-CM codes instead of those of ICD-10-CM.


When evaluating the effectiveness of healthcare interventions, randomized controlled trials (RCTs) are seen as the gold standard research design. It is important that RCTs recruit their target number of participants in order to avoid being underpowered, particularly as a lack of statistical power may lead to the reporting of clinically important effects as statistically nonsignificant. Statistically nonsignificant findings can increase the risk of dismissing the use of interventions before their true value is established or delaying their use while additional costly trials are carried out. Many RCTs are abandoned or do not produce unequivocal evidence due to recruitment difficulties, which also means that the resources spent for setting up and running the RCT have not been put to their best use.10

Another recently reported implication of low enrolling sites was the issue of costs. One-third of all studies terminated between 2005 and 2009 at Oregon Health Sciences University had low enrollment, and these lowenrolling studies cost the institution almost $1 million annually.11 The recruitment of research participants is critical to conducting clinical and translational research. Failure to recruit research participants has a negative financial impact, but more importantly, under-enrolled studies do not contribute to scientific or clinical knowledge. Academic health centers have identified recruitment into clinical trials as a priority, and more efforts and resources should be shifted to improve the recruitment of research participants.10,11

The validity of single specific codes has been well described5,6,12; however, most reports do not describe the combination of ICD-9-CM codes with other codes. This validation of single codes is a critical component of outcomes, pharmacovigilance signaling, and comparative effectiveness research. However, there is a lack of data reporting on the validity of combinations of ICD- 9-CM codes or the combination of ICD-9-CM codes with other claims-based information to identify specific clinical presentations or populations of interest. These strategies have been used to report events in cohorts of specific populations, such as intestinal perforation among rheumatoid arthritis patients,13 or progression of liver disease.14 However, there has been no translation of this knowledge to identify the ideal potential clinical trial participant. As we showed in this report, it is possible to identify the subgroup of subjects who are likely to be eligible by creating algorithms of well-validated individual codes that represent the inclusion/exclusion criteria of the clinical trial.


Identifying a set of validated codes and creating algorithms that include the inclusion/exclusion criteria of a randomized study can potentially aid in the recruitment of the study using mailings. It will be necessary to evaluate the impact of this strategy after ICD-10-CM is implemented, as well as in the recruitment of minorities.

Author Affiliations: Department of Medicine, Miller School of Medicine, University of Miami (LT, AP, YS, GC), Miami, FL; Veterans Affairs Medical Center (LT, AP, JD, YS, GC), Miami, FL.

Source of Funding: None.

Author Disclosures: The authors report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.

Authorship Information: Concept and design (LT,AP); acquisition of data (LT, JD, GC); analysis and interpretation of data (LT, GC, YS); drafting of the manuscript (LT, GC, AP, YS); critical revision of the manuscript for important intellectual content (LT, AP, GC, YS); statistical analysis (LT, GC); provision of study materials or patients (LT, GC, JD); obtaining funding (LT, GC); administrative, technical, or logistic support (JD); supervision (GC, AP).

Address correspondence to: Leonardo Tamariz, MD, MPH, Miller School of Medicine, University of Miami, 1120 NW 14th St, Ste 971 (H- 201), Miami, FL 33136. E-mail: ltamariz@med.miami.edu.1. Lloyd-Jones D, Adams R, Carnethon M, et al; American Heart Association Statistics Committee and Stroke Statistics Subcommittee. Heart disease and stroke statistics—2009 update: a report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. Circulation. 2009;119(3):e21-e181.

2. Kennedy BM, Kumanyika S, Ard JD, et al. Overall and minorityfocused recruitment strategies in the PREMIER multicenter trial of lifestyle interventions for blood pressure control. Contemp Clin Trials. 2010;31(1):49-54.

3. West SL, Strom BL, Poole C. Validity of pharmacoepidemiologic drug and diagnosis data. In: Strom BL, ed. Pharmacoepidemiology, 4th ed. Chichester, England: John Wiley; 2005:709-766.

4. McCarthy EP, Iezzoni LI, Davis RB, et al. Does clinical evidence support ICD-9-CM diagnosis coding of complications? Med Care. 2000;38(8):868-876.

5. Birman-Deych E, Waterman AD, Yan Y, Nilasena DS, Radford MJ, Gage BF. Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Med Care. 2005;43(5):480-485.

6. Kokotailo RA, Hill MD. Coding of stroke and stroke risk factors using International Classification of Diseases, revisions 9 and 10. Stroke. 2005;36(8):1776-1781.

7. Sacks DB, Arnold M, Bakris GL, et al; National Academy of Clinical Biochemistry; Evidence-Based Laboratory Medicine Committee of the American Association for Clinical Chemistry. Guidelines and recommendations for laboratory analysis in the diagnosis and management of diabetes mellitus. Diabetes Care. 2011;34(6):e61-e99.

8. Maynard C, Chapko MK. Data resources in the Department of Veterans Affairs. Diabetes Care. 2004;27 (suppl 2):B22-B26.

9. Palacio AM, Tamariz LJ, Uribe C, et al. Can claims-based data be used to recruit black and Hispanic subjects into clinical trials? Health Serv Res. 2012;47(2):770-782.

10. Nasser N, Grady D, Balke CW. Commentary: improving participant recruitment in clinical and translational research. Acad Med. 2011;86(11):1334-1335.

11. Kitterman DR, Cheng SK, Dilts DM, Orwoll ES. The prevalence and economic impact of low-enrolling clinical studies at an academic medical center. Acad Med. 2011;86(11):1360-1366.

12. de Burgos-Lunar C, Salinero-Fort MA, Cárdenas-Valladolid J, et al. Validation of diabetes mellitus and hypertension diagnosis in computerized medical records in primary health care. BMC Med Res Methodol. 2011;11:146.

13. Curtis JR, Chen SY, Werther W, John A, Johnson DA. Validation of ICD-9-CM codes to identify gastrointestinal perforation events in administrative claims data among hospitalized rheumatoid arthritis patients. Pharmacoepidemiol Drug Saf. 2011;20(11):1150-1158.

14. Lo Re V 3rd, Lim JK, Goetz MB, et al. Validity of diagnostic codes and liver-related laboratory abnormalities to identify hepatic decompensation events in the Veterans Aging Cohort Study. Pharmacoepidemiol Drug Saf. 2011;20(7):689-699.

Related Videos
© 2024 MJH Life Sciences
All rights reserved.