The authors discuss the design and evaluation of a health information technology platform that enables comprehensive, automated assessment of care quality in electronic medical records.
To assess the performance of a health information technology platform that enables automated measurement of asthma care quality using comprehensive electronic medical record (EMR) data, including providers’ free-text notes.
Retrospective data study of outpatient asthma care in Kaiser Permanente Northwest (KPNW), a midsized health maintenance
organization (HMO), and OCHIN, Inc, a group of Federally Qualified Health Centers.
We created 22 automated quality measures addressing guideline-recommended outpatient asthma care. We included EMRs of asthma patients aged >12 years during a 3-year observation window and narrowed this group to those with persistent asthma (13,918 KPNW; 1825 OCHIN). We validated our automated quality measures using chart review for 818 randomly selected patients, stratified by age and sex for each health system. In both health systems, we compared the performance of these measures against chart review.
Most measures performed well in the KPNW system, where accuracy averaged 88% (95% confidence interval [CI] 82%-93%). Mean sensitivity was 77% (95% CI 62%-92%) and mean specificity was 84% (95% CI 75%-93%). The automated analysis was less accurate at OCHIN, where mean accuracy was 80% (95% CI 72%-89%) with mean sensitivity and specificity 52% (95% CI 35%-69%) and 82% (95% CI 69%-95%) respectively.
Conclusions: To achieve comprehensive quality measurement in many clinical domains, the capacity to analyze text clinical notes is required. The automated measures performed well in the HMO, where practice is more standardized. The measures need to be refined for health systems with more diversity in clinical practice, patient populations, and setting.
(Am J Manag Care. 2012;18(6):313-319)
Quality improvement requires comprehensive quality measurement, which in turn requires robust automation to be broadly applicable and reliable.
To guide quality improvement, we must comprehensively measure the quality of healthcare. Currently, this process is hindered by expensive, time-consuming, and sometimes inconsistent manual chart review.1 Electronic medical records (EMRs), which have become more prevalent as a result of the American Recovery and Reinvestment Act of 2009, promise to make routine and comprehensive quality measurement a reality.2 However, informatics challenges have hindered progress: care guidelines may not be specified to allow for automated measurement; the needed data are not standardized and are subject to variations in EMRs and clinical practice; and much of the data required are in the free-text notes that care providers use to document clinical encounters, which may not be accessible.
Our research team, funded by the Agency for Healthcare Research and Quality (AHRQ), designed and implemented an automated method to comprehensively assess outpatient asthma care. In doing so, we aimed to develop a platform that could automate care measurement for any condition.3
We conducted a retrospective data study of outpatient asthma care in 2 distinct healthcare systems: Kaiser Permanente Northwest (KPNW) and the Federally Qualified Health Centers (FQHCs) associated with OCHIN, Inc. We obtained institutional review board approval and executed Data Use Agreements between research organizations for this study.
OCHIN serves the data management needs of FQHCs and other community health centers that care for indigent, uninsured, and underinsured populations. OCHIN has licensed an integrated Practice Management and EMR data system from Epic Systems. We approached 8 FQHCs associated with OCHIN (caring for 173,640 patients at 44 locations through 2010), and all agreed to participate in this study.
Kaiser Permanente Northwest
KPNW is a nonprofit, groupmodel health maintenance organization (HMO) that provides comprehensive, prepaid healthcare to members. KPNW serves about 475,000 members in Oregon and Washington. All patient contacts are recorded in a single, comprehensive EMR—the HealthConnect system.
We included the electronic records of patients 12 years or older at the start of 2001 (KPNW) or 2006 (OCHIN) who had at least 1 diagnosis code for asthma (35,775 in KPNW and 6880 in OCHIN). We then narrowed this population to those defined as having persistent asthma () to reach the target population of interest (13,918 KPNW; 1825 OCHIN).
Developing the Measure Set
To automate the assessment of asthma care quality, recommended care steps from credentialed guidelines needed to be converted into quantifiable measures. The development of each measure began with a concise proposition about the recommended care for specific patients, such as “patients seen for asthma exacerbation should have a chest exam.” We used an 8-stage iterative process to ensure that our quality measures would be comprehensive and current. This process included 4
vetting steps with local and national experts. We identified 25 measures from comprehensive, rigorous quality measure sets, primarily derived from RAND’s Quality Assessment system.1,4,5 We then revised the measures to reflect updated guidelines6 and restricted our attention to asthma care in the outpatient setting, resulting in a set of 22 measures that we labeled the Asthma Care Quality (ACQ) measure set ().
Operationalizing the Measure Set
Each measure was specified as a ratio based on its applicability to individual patients (denominator) and evidence that the recommended care had been delivered (numerator). Performance on each measure can then be reported as the percentage of patients who received recommended care from among those for whom that care was indicated. For example, the national RAND study of McGlynn and colleagues demonstrated that across 30 clinical conditions, Americans received about 55% of recommended care.1 Ratios for each measure can be produced at the patient, provider, clinic, and healthsystem levels.
We investigated providers’ clinical practices related to each measure in the ACQ measure set and how care was documented and stored in the EMR. Each measure’s numerator requires a “measure interval,” which is the time window during which the care events must take place. The measure interval is oriented around an index date that is a property of denominator inclusion. For example, for the measure “patients seen for asthma exacerbation should have a chest exam,” the index date is the exacerbation encounter and the measure interval includes only that encounter. On the other hand, for the measure “patients with persistent asthma should have a flu vaccination annually,” the index date is the event that qualifies the patient as having persistent asthma and the measure interval is operationalized to include encounters 6 months before through 12 months after the index date.
Applying the Measure Set
For each of the 22 quality measures, we first defined an observation period (in our case, 3 years of clinical events) and divided it into a period for denominator qualification (the selection period) followed by an evaluation period, during which, in most cases, the prescribed care events were identified. For this study, we used a 2-year selection period based on a modified version of the Healthcare Effectiveness Data and Information Set asthma criteria to identify patients with persistent asthma (used in all of our measures) or those presenting with an asthma exacerbation (used in 36% of the measures in our set). We identified patients as having persistent asthma if they met minimum criteria for asthma-related utilization or if this diagnosis could be specifically determined from the provider’s clinical notes (Table 1). Asthma exacerbation criteria were based on hospital discharge diagnosis or an outpatient visit associated with a glucocorticoid order/dispensing and a text note about exacerbation.
Automated System Design
We designed a quality measurement system as a “pipeline” of transformation and markup steps taken on encounter-level EMR data. The goal was to capture all of the clinical events required to assess care quality ().
Data begin traveling through the pipeline when they are extracted from each EMR system’s data warehouse. These data extracts—produced by a component called the EMR Adapter—contain data aggregated into records at the encounter (visit) level for all patients. In our study, these records included the coded diagnoses, problems, and medical history updates; medications ordered, dispensed, and noted as current or discontinued; immunizations, allergies, and health maintenance topics addressed; and procedures ordered, progress notes, and patient instructions.
The data are then exported from the EMR data warehouse (typically, a relational database) into file-based eXtensible Markup Language (XML) documents according to a local specification. The first transformation step involves converting locally defined XML formats into a common, standard XML format conforming to the HL7 Clinical Document Architecture (CDA) specification.7
The CDA provides a canonical representation of encounterlevel data that is used as an input to our medical record classification system, MediClass.8 MediClass uses natural language processing and rules defining logical combinations of marked up and originally coded data to generate concepts that are then inserted into the CDA document. This system has been successfully used to assess guideline adherence for smoking cessation,9 to identify adverse events due to vaccines,10 and for other applications that require extracting clinical data from EMR text notes.
Up to this point, data processing is performed locally, within the secure data environments of each study site. The next step filters these data to identify only clinical events (including specific concepts identified in the text notes) that are part of the quality measures of the study. The result is a single file of measure set—specific clinical event data, in comma-delimited format, called the Events Data set. Each line in this file describes a patient, provider, and encounter, along with a single event (and attributes specific to that event) that is part of 1 or more measures in the set.
The distinct data pipelines from each health system converge into a single analysis environment at the data coordinating center, where quality measures are computed. The Events Data set files are transferred to a central location (KPNW) for final analysis and processing. Here, information contained in the aggregate events data set is processed to provide the clinical and time-window criteria for identifying patients who meet numerator and denominator criteria for each measure. Finally, the proportion of patients receiving recommended services is computed.
Of the 22 measures in the ACQ measure set, nearly 70% are enhanced by or require processing of the providers’ text clinical notes. In particular, 8 measures (36%) were determined to require processing providers’ text notes because the necessary numerator events only occur in the text clinical notes. An additional 7 measures (32%) were enhanced by this processing because the text notes provided an important alternative source for the necessary numerator clinical events, which we determined would significantly improve the measure’s sensitivity (Table 2). Furthermore, qualification for any measure in the ACQ measure set occurred by text-based assessment of persistent asthma in 26% of all patients. Of these, 30% qualified as having persistent asthma by text processing alone.
Of the 22 measures, we were able to implement 18 (Table 2). Measure 20 quantified asthma control while measure 4 addressed the appropriateness of step-up therapy with poor asthma control. During our project period, we were unable to implement these complex measures relying on assessments of control. Measure 13 required access to in-office spirometry results, and measure 14 required capture of repeat spirometry readings during the visit. In-office spirometry was only sporadically available in the data warehouse of 1 site and was entirely unavailable at the other site. We continue to seek ways to operationalize these measures. Below, we report on our implementations of the remaining 18 measures.
Chart Review Validation
To assess how well our implementation of the ACQ measure set performed, we carried out a validation process using chart review for 818 patients randomly selected from among those identified to have persistent asthma by our method, stratified by age and sex for each health system (443 at KPNW and 375 at OCHIN). Each stratum was populated with 3 to 10 distinct patients who had an exacerbation within the chart review time period. This allowed us to compare the overall accuracy, sensitivity, and specificity of the ACQ measures by site, relative to results obtained by manual chart abstraction performed by trained abstractors (the reference standard).
Most ACQ measures performed relatively well in the KPNW system (Table 2). Measure accuracy (agreement with chart review) ranged from 63% to 100% and averaged 88% across all measures (95% confidence interval [CI] 82%-93%). Mean sensitivity was 77% (95% CI 62%-92%) and was 60% or greater for 15 of the 18 measures (and 90% or greater for 9 of those measures). Similarly, mean specificity was 84% (95% CI 75%-93%), with 15 measures having specificity of 60% or greater (9 measures with specificity of 90% or greater). For 2 measures, specificity was more than 90% but sensitivity was poor. The measure attempting to ascertain whether a history or review of prior hospitalizations and emergency department visits had been obtained failed to identify any of the 5 patients noted by abstractors to have received this care. In addition, documentation of patient education in the case of newly prescribed inhaled therapy had sensitivity of just 12%, identifying only 3 of 25 patients noted by the abstractors to have received this care. There was only 1 patient on theophylline in the KPNW chart review sample, precluding estimation of the accuracy of this measure.
The automated ACQ analysis was less accurate against the OCHIN system (Table 2). These differences came not from the level of quality found in the 2 systems but from differences in documentation, EMR implementation, and clinical practice that our method is not yet properly accommodating. Mean overall accuracy was 80% (95% CI 72%-89%) and ranged from 36% to 99% across all measures. Mean sensitivity and specificity were 52% (95% CI 35%-69%) and 82% (95% CI 69%-95%), respectively. Performance was better among the routine measures compared with the exacerbation-related measures. Among the 11 routine care measures, 8 had specificities higher than 80% and 5 had sensitivities higher than 80%. Of these measures, 3 had specificities of 50% or lower, while another 5 measures had sensitivities of 50% or lower. Of the 7 exacerbation-related measures, 5 were evaluable at OCHIN (assessment was not possible for 2 of the exacerbation measures: no patients on theophylline were identified, and since hospital discharge information was unavailable, the 4-week follow-up contact prescribed by measure 16 was not evaluable). Among the 5 evaluable measures for exacerbation care, overall accuracy ranged from 36% to 96%. Sensitivity tended to be low (5.3% to 58.1%), while specificities were generally high (95% or higher for 4 of the 5 measures).
Across the evaluable measures at each site, specificity was similar: 9 of 16 measures reached 90% or better. The largest difference between sites was seen in measure sensitivity. While most measures in KPNW reached 60% sensitivity (15 of 18 measures), only a minority (6 of 16 measures) met or exceeded 60% sensitivity in OCHIN. Of the 9 measures with sensitivity below 50% at OCHIN, 6 of them rely exclusively on text processing of clinical notes, indicating that our textprocessing solution needs to be further refined to identify the relevant clinical events the abstractors observed in the OCHIN EMR. Potential explanations for this discrepancy in performance of the automated measures between our 2 study sites include (1) the possibility that the chart reviewer was observing events in comment fields of the OCHIN records that were not available to the automated program and (2) the possibility that there may be greater variability in how and where OCHIN providers document visits. Each of these reasons could explain the inability of our automated method to identify the necessary events in the OCHN medical records. Additional modification of the specifications of the automated method will be needed to capture these differences across medical records. Another likely explanation is that the OCHIN EMR is quite new compared with KPNW’s, having been implemented in 2005. Hence, familiarity with the EMR and documentation support resources may have affected completeness and consistency of clinical data entry in some cases.
The automated method described here utilizes the clinically rich, heterogeneous data captured in transactional EMR systems to assess the quality of care delivered to patients with persistent asthma. One question that arises is whether administrative data alone could suffice to perform this task. Leaving aside the fidelity issues inherent in using billing data to compute clinical care process metrics, we found that at most 6 of the 22 ACQ measures (and 5 of the 18 measures we implemented) could be addressed partially or in whole with administrative data alone. In short, the use of administrative data alone would not meet the goals of assessing compliance with current guidelines for best practices in caring for patients with persistent asthma.
Comprehensive and routine quality assessment requires both state-of-the-art EMR implementation and an adaptable health information technology platform that enables automated measurement of complex clinical practices. We designed a system to respond to these challenges and implemented it in 2 diverse healthcare systems to assess outpatient asthma care. These automated measures generally performed well in the HMO setting, where clinical practice is more standardized; additional refinement is needed for health systems that encompass more diversity in clinical practice, patient population, and setting.
Our design overcomes many challenges created by textbased guidelines, nonstandard data elements, and text clinical notes. Although we have only been able to implement 18 of the 22 measures to date, and although chart review showed that some may require refinement, the automated approach promises to be more affordable than chart review. Future work will explore whether our design will accommodate all of the ACQ measures and whether our implementation can be enhanced to improve performance in more diverse healthcare systems.Author Affiliations: From Center for Health Research (BH, MAM, RAM), Kaiser Permanente Northwest, Portland, OR; OCHIN Inc (JEP, SLC), Portland, OR.
Author Disclosures: Dr Hazlehurst reports employment with Kaiser Permanente and has received and has pending grants from the Agency for Healthcare Research and Quality, the funder of this study. Mr Puro reports receiving grants from Obesity Care Quality and the Comparative Effectiveness Research Hub. Ms Chauvie reports receiving grants from Obesity Care Quality. The other authors (MAM, RAM) report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.
Authorship Information: Concept and design (BH, RAM, JEP, SLC); acquisition of data (BH, JEP, SLC); analysis and interpretation of data (BH, MAM, RAM, JEP); drafting of the manuscript (BH, MAM, RAM, SLC); critical revision of the manuscript for important intellectual content (BH, MAM, RAM, JEP, SLC); statistical analysis (MAM, RAM, JEP); provision of study materials or patients (BH); obtaining funding (BH, RAM); administrative, technical, or logistic support (BH, RAM, JEP, SLC); and supervision (BH, JEP).
Funding Source: Agency for Healthcare Research and Quality, grant R18-HS17022.
Address correspondence to: Brian Hazlehurst, PhD, Center for Health Research, 3800 N Interstate Ave, Portland, OR 97227-1110. E-mail: Brian.Hazlehurst@kpchr.org.1. McGlynn EA, Asch SM, Adams J, et al. The quality of health care delivered to adults in the United States. N Engl J Med. 2003;348(26):2635-2645.
2. Corrigan J, Donaldson MS, Kohn LT, eds. Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: National Academies Press; 2001.
3. Hazlehurst B, McBurnie M, Mularski R, Puro J, Chauvie S. Automating quality measurement: a system for scalable, comprehensive, and routine care quality assessment. AMIA Annu Symp Proc. 2009:229-233.
4. Mularski RA, Asch SM, Shrank WH, et al. The quality of obstructive lung disease care for adults in the United States as measured by adherence to recommended processes. Chest. 2006;130(6):1844-1850.
5. Kerr EA, Asch SM, Hamilton EG, McGlynn EA. Quality of Care for Cardiopulmonary Conditions: A Review of the Literature and Quality Indicators. Santa Monica, CA: RAND; 2000.
6. National Asthma Education and Prevention Program, Third Expert Panel on the Diagnosis and Management of Asthma. Expert Panel Report 3: Guidelines for the Diagnosis and Management of Asthma. Bethesda, MD: National Heart, Lung and Blood Institute; August 2007. http://www.ncbi.nlm.nih.gov/books/NBK7232/. Accessed April 17, 2012.
7. Dolin RH, Alschuler L, Boyer S, et al. HL7 Clinical Document Architecture, Release 2. J Am Med Inform Assoc. 2006;13(1):30-39.
8. Hazlehurst B, Frost HR, Sittig DF, Stevens VJ. MediClass: a system for detecting and classifying encounter-based clinical events in any electronic medical record. J Am Med Inform Assoc. 2005;12(5):517-529.
9. Hazlehurst B, Sittig DF, Stevens VJ, et al. Natural language processing in the electronic medical record: assessing clinician adherence to tobacco treatment guidelines. Am J Prev Med. 2005;29(5):434-439.
10. Hazlehurst B, Naleway A, Mullooly J. Detecting possible vaccine adverse events in clinical notes of the electronic medical record. Vaccine. 2009;27(14):2077-2083.