Objective: Health plans, medical groups, and commercial vendors are using administrativedata to measure clinical performance at the plan or physician level. We compared results of using administrative claims data alone versus administrative data combined with chart review for selected Healthcare Effectiveness Data and Information Set (HEDIS) measures.
Study Design: Cross-sectional comparison of health plan performance rates using different methods of data collection.
Methods: We analyzed data reported by 283 commercial managed care plans in 2004 and 2006 for 15 HEDIS hybrid measures. Hybrid specifications included the use of administrative data supplemented with medical record review and required plans to report performance rates based on administrative data only and for administrative data supplemented with chart review. We calculated differences in rates and changes in quartile rankings of health plans between the 2 reported rates.
Results: Performance rates using administrative data alone were substantially lower than rates using combined data (average difference of 20.4 percentage points). On average, more than half of the plans had different quartile rankings based on administrative-only rates versus combined data rates. Measures relying on laboratory claims or laboratory results had the largest discrepancies.
Conclusions: Currently available health plan administrative data alone do not appear toprovide sufficiently complete results for ranking health plans on HEDIS quality-of-care measures with hybrid specifications. The results suggest that reporting of clinical performance measures using administrative data alone should include prior testing and reporting on the completeness of data, relative rates, and changes in rankings compared with the use of combined administrative data and chart review..
(Am J Manag Care. 2007;13:553-558)
For HEDIS measures that are specified for reporting using a combination of administrative data and chart review data, using administrative data alone does not appear to be sufficient to estimate performance
Currently available health plan administrative data sets, whether from individual plans or pooled across plans, are unlikely to support accurate comparisons of plans or physicians on many quality measures, most notably those measures that rely on laboratory claims or laboratory results.
The completeness and accuracy of measures relying exclusively on
are relatively expensive and difficult to administer and are not sufficient to address the technical quality of clinician performance.
A number of past studies6-9 have questioned the utility of administrative data for characterizing clinical quality. However, the studies focused on a single disease or measure, included only physician visit or hospital claims, and did not consider the inclusion of pharmacy claims, laboratory claims, or laboratory results. The future holds the promise of expanding electronic data to include not only laboratory results in an electronic format, but also data from electronic health records. Due to a variety of problems including interoperability, lack of standardized coding schemes, and the inability to retrieve some critical data electronically, the widespread availability of data useful for measurement from electronic health records appears to be some years away.10-13 Thus, at present, measurement of health plan or physician-level clinical performance relies on either manual data abstraction from medical records or the use of administrative data or some combination of these 2 strategies.
Data abstraction from paper medical records is an expensive and time-consuming process. Thus the use of administrative data is nearly always preferable given comparable levels of accuracy. The question then arises as to whether currently available administrative data consisting of various types of claims data, and in some cases laboratory results, are sufficient to evaluate the quality of clinical care. The report that follows compares performance results from 283 health plans based on their reporting of data on Healthcare Effectiveness Data and Information Set (HEDIS) measures with hybrid specifications. Hybrid specifications allow the use of administrative data supplemented with medical record review and require plans to report performance rates based on both administrative-only data and administrative data supplemented with chart review. The specific study questions were:
1. What is the difference between the health plan performance rates for HEDIS hybrid measures based on administrative-only data collection versus administrative data supplemented by medical record review?
2. Did differences between administrative-only data and combined data diminish between 2004 and 2006?
3. For a given measure, how much do plan rankings by quartile based on that measure change with the addition of medical record data?
4. Do results from the current study and a review of data sources used for HEDIS measures specified for administrative-only data collection suggest which current data sources are not sufficiently accurate to justify administrative-only data for HEDIS and other clinical performance data sets?
The National Committee for Quality Assurance (NCQA) has developed an extensive set of HEDIS performance measures. In 2006, 283 commercial health plans submitted HEDIS data and all but 11 of these plans' results were publicly reported in NCQA's annual “State of Health Care Quality”14 and, in collaboration with US News & World Report, in an annual “American's Best Health Plansâ€ report.15
HEDIS Specifications for Administrative and Hybrid Measures
data as including visit, procedure, laboratory, and pharmacy claims, as well as laboratory results data, all of which must be available in an electronic format.
Generally, if in field tests performance rates based on administrative data alone vary by more than 5% from rates based on administrative plus medical record review, the measures are specified for hybrid data collection. Other field-test findings such as a high variance in means may also be considered by the NCQA measurement review panel (the Committee on Performance Measurement or CPM).
Like the administrative-only specification, the hybrid specification requires that administrative data be used to identify all members of the plan that meet the denominator requirements for the eligible population. Still using only administrative data, the plan then determines what proportion of the eligible patients meets the numerator requirements. If in comparing the rates to benchmarks, the administrative data are determined to provide an adequate rate, the plan may report the measure on the full population using administrative-only data. If it is determined that the rate is lower than anticipated, the plan can draw a random sample of 411 members from the denominator eligible population. The sample size of 411 specified by NCQA is based on a statistical estimation of providing an 85% chance of identifying a 5% difference between plans. The plans then conduct chart reviews on medical records of patients who are identified in the denominator but for whom administrative data do not indicate the numerator criteria were fulfilled. The HEDIS documentation gives specific instructions on the sampling method and conduct of chart reviews.16
The hybrid specification method allows plans that have more complete and accurate administrative data systems report using only administrative data, or at least to reduce the number of chart reviews required by achieving a higher proportion of numerator “hitsâ€ using administrative data alone. The NCQA reviews data on hybrid measures on an annual basis to determine whether measures can be moved from hybrid data collection to the administrative-only specification. For example, the breast cancer screening measure changed from hybrid to administrative-only collection in 2005 based on a review of data from prior years and subsequent action by the NCQA CPM.
The analyses reported in this paper are limited to 283 commercial health plans. A similar analysis was done for Medicare and Medicaid health plans, but because the results were similar, with somewhat lower overall means and slightly larger increases with use of chart review in Medicaid plans, only the results for commercial plans are included.
Table 2 displays the means of performance rates based on administrative-only versus combined data reporting. On all the measures across both years, the administrative mean rates were lower than the combined rates. The average magnitude of the differences between the combined and administrative-only rates was 20.4 percentage points in 2004 and 20.6 in 2006. Considering 2006 rates, only 4 measures–Cervical Cancer Screening, the 2 Well-Child Visits measures, and Adolescent Well Care–showed differences between administrative-only and combined data rates of less than 9 percentage points.
Converting the percentage point differences to percentage difference (ie, percentage [rather than percentage point] increase in rate afforded on average by using combined data) indicates that in 2006, only the 3 visit utilization measures (the 2 Well- Child Visits measures and Adolescent Well Care) were close to the 5% threshold used when changing from hybrid to administrative- only data specification (Appendix Table B, available at www.ajmc.com). Given that most plans already chose to report these 3 hybrid measures using administrative data alone (Appendix Table A), the differences in rates if all plans used and reported combined data would be expected to be even smaller.
Impact on Plan RankingsEven where the gaps between mean performance rates based on administrative and combined data are large, ranking plans based on administrative-only performance rates might be accurate if differences in the means using the 2 methods were consistent between plans. The standard deviation of the differences (Table 2) suggests that this is not the case. Our examination of changes in quartile ranking of plans (Appendix Table B) confirms this inference. For most hybrid measures, a large proportion of plans change their quartile rank when one moves from rates using only administrative data to combined data. For measures such as glycosylated hemoglobin (HbA1c) testing or screening for cholesterol, the proportion of plans that changed quartiles exceeded 60%, indicating a highly unstable classification when moving from administrative data to combined data.
Type of Data Required and Validity of Administrative Data
Although the proportion of plans doing so is small for most measures, and therefore unlikely to have substantially changed the results, we point out (Appendix Table A) that some plans chose to report HEDIS hybrid measures with administrative-only data. The effect of this is unknown, but the plans that reported administrative-only data for hybrid measures score, on average, below those that reported combined data. There is also an assumption that health plan databases reflect the same range and availability of data as other potential data sources. As noted, some large medical group databases may have much fuller capture of laboratory claims or results. Conversely, other databases, such as the current CMS Medicare database, do not yet include any laboratory results, and the accuracy and completeness of pharmacy claims in Medicare is as yet unknown. This study was done with data from plans that report HEDIS data to NCQA on their commercial populations. Our analysis was limited to a subset of HEDIS measures. In addition, the parameters for determining which measures are specified as hybrid are set by the NCQA CPM. While the CPM includes among others, practicing physicians, physician organizations, health plan medical directors, and technical experts in measurement, the parameters may not reflect full consensus related to this issue. Finally, we are aware of plans and some medical groups in which either data are widely available from interoperable electronic medical records or where there is nearly complete electronic data on laboratory results. It is possible that these plans or groups could use these electronic data as the sole basis for a much broader array of clinical performance measures.
These results should only be on a starting point for determination of which measures may be reported using administrative- only data from a specific data source. Clearly, there is enough uncertainty for each measure that the source of data for that measure should be tested and, where possible, published before use for accountability at any level of the healthcare system. The growing effort to try to include all physician practices and achieve an adequate sample size of patients by combining administrative data from multiple plans does not address the problem of data accuracy raised in this study. Indeed, such pooling of data may actually reduce the accuracy of the administrative data by including data sources that are less complete. Likewise, addressing the inherent reliably and validity of measures themselves, or issues such as attribution, while important, do not mitigate the need to explore the completeness and accuracy of using a specific source of administrative data.
Attempts are being made to expand the range of administrative data through the use of new claims-related codes (termed CPT-II or “Gâ€ codes). However, we are unaware of studies of the accuracy, reliability, and validity of using CPT-II or G codes in clinical quality measures. Finally, a relatively small but increasing proportion of physician practices have the ability to augment existing electronic claims data with data abstracted from electronic medical records. However, due to a variety of problems including compatibility and lack of standardization, the widespread availability of data from electronic medical records that could be used in measurement appears some years away. Thus, there is a critical and urgent need to both identify and carefully evaluate new data approaches and methods to improve the availability, completeness, and accuracy of data for quality monitoring and reporting.
Author Affiliations: From the National Committee for Quality Assurance, Washington, DC (LGP, SHS); and the American College of Physicians, Philadelphia, Pa (AP).
Author Disclosure: Dr Pawlson is an employee of the National Committee for Quality Assurance, which collects and reports on Healthcare Effectiveness Data and Information Set measures. The authors (SHS, AP) report no conflicts of interest.
Authorship Information: Concept and design (LGP, SHS, AP); acquisition of data (LGP); analysis and interpretation of data (LGP, SHS, AP); drafting of the manuscript (LGP, SHS, AP); critical revision of the manuscript for important intellectual content (LGP, SHS); statistical analysis (AP); supervision (LGP).
Address correspondence to: L. Gregory Pawlson, MD, 2000 L Street, NW,
2. Rosenthal MB, Dudley RA. Pay-for-performance. JAMA. 2007;297:740-744.
6. Iezzoni LI. Assessing quality using administrative data. Ann Intern Med. 1997;127:666-674.
8. Steinwachs DM, Stuart MR, Scholle S, Starfield B, Fox MH, Weiner JP. A comparison of ambulatory Medicaid claims to medical records: a reliability assessment. Am J Med Qual. 1998;13:63-69.
10. Keating NL, Landrum MB, Landon BE, Ayanian JZ, Borbas C,Guadagnoli E. Measuring the quality of diabetes care using administrative data: is there bias? Health Serv Res. 2003;38:1529-1545.
health records to assess quality of care for outpatients with heart failure. Ann Intern Med. 2007;146:270-277.
13. Kramer TL, Owen RR, Cannon D, et al. How well do automated performance
14. National Committee for Quality Assurance. The State of Health Care Quality 2006.Washington, DC: National Committee for Quality Assurance; 2006. Available at: http://www.ncqa.org/communications/sohc2006/sohc_2006.pdf. Accessed September 11, 2007.
plans 2006. Available at: http://health.usnews.com/usnews/health/best-health-insurance/topplans.htm. Accessed July 31, 2007.
Specifications.Washington, DC: National Committee for Quality Assurance; 2006.