The Performance of Performance Measures

, ,
The American Journal of Managed Care, October 2007, Volume 13, Issue 10

Andrew Auerbach, MD, MPH, and colleagues recently argued that interventions (emphasis added) to improve the quality of healthcare should meet the same standards of evidence that are applied to the adoption of new medical technologies.1 In this issue of The American Journal of Managed Care, Pawlson et al similarly propose that each new measure of quality–and each designated data source for a measure–should be tested, and the results of testing published in peer-reviewed journals.2 In other words, what is the “performance†of performance measures? With the increasing volume of available performance measures–more than 86 measures have been endorsed by the National Quality Forum (NQF) for ambulatory care alone3– and the wide-ranging use of performance measures, tests must not only be completed on measures before implementation, but results of tests must be reported in a standardized and meaningful manner to enable selection of the most appropriate measures and data sources for each implementation program.

Pawlson et al make a strong contribution to this field by expanding on previous work to evaluate the use of administrative data for performance measures and in beginning to identify which data elements may be more accurate and complete. The authors' presentation of results is a starting place for evolving standards for reporting measure-testing results.

Evaluating Administrative Data: New Wine in an Old Bottle

Although Pawlson et al do not provide a rigorous decomposition of reliability for specific data elements by data source, their article draws important attention to the heterogeneity in sources for data elements that go into the calculation of performance measures. Administrative data, medical charts, and hybrid data may provide different data collection advantages and costs by data element. For example, documentation of performance of screening tests should be reliably captured by claims from laboratories, even if the clinical laboratory results may be less reliably captured across different laboratories. Efforts should be undertaken to optimally match data sources to measures so that the greatest accuracy can be realized at the least data collection costs.

Another approach to improving the reliability and validity of administrative data for quality measurement is to expand and enhance standard coding conventions to capture physician actions and decision making that is indicative of quality. This strategy has been undertaken by the Current Procedural Terminology (CPT®) Editorial Panel, which has developed CPT Category II codes as a means to report more detailed claims data required for performance measure reporting than is possible through CPT codes (Category I or Category III) used for payment of services.8 The success of this approach will, of course, need to be tested.

Electronic Health Record Systems (EHRS)

Other genres of medical literature, such as clinical trials,12 cost-effectiveness analyses,13 and diagnostic accuracy studies, 14 have evolved standards for reporting in recognition of the role that their literature plays in major decision-making within the healthcare sector. The best data and the best tests cannot produce useful judgments about the quality of healthcare without transparency in the communication of research design and results.

The development or adoption of similar standards could also benefit the emerging field of performance measurement. The NQF intends to promote standards for testing of performance measures by delineating “time-limited†endorsement, whereby measure developers have 2 years in which to produce and provide evidence of testing. The American Medical Association—convened Physician Consortium for Performance Improvement® (Consortium) will vote at its October 2007 meeting on a protocol to guide and define its measure testing activities. Although these initiatives provide a framework to structure measure testing activities, no one has yet broached the problem of ensuring accurate and appropriate reporting of measure testing results. The results described by Pawlson and colleagues provide a few specific illustrations why the field of quality measurement may benefit from a more uniform approach to reporting measurement studies.

For example, the authors present the means of each performance measure derived from each type of data, but omit measures of variability for the 2 measurement distributions although one may well be interested in whether variability of measurements differ by data source. Another example where a recommendation could be useful concerns reporting of rank order changes that would result under alternative data collection strategies. Inadequate description of the distributions within which observations are ranked can obfuscate the meaning of rank order changes. Reporting the number and percent of observations experiencing a change in quartile rank resulting from measurement under the alternative data collection strategies would be more informative with percentile values, and/or interquartile ranges. A shift from one quartile to an adjacent quartile has quite different implications depending on the width of the interquartile range.

Reporting standards also may be valuable in communicating and presenting statistical comparisons. Pawlson et al present means of quality indicators calculated from 2 alternative data collection strategies as well as the differences in means, but do not present the results of formal statistical tests of significance for the differences. Other authors have adapted sensitivity, specificity, and predictive values for use in evaluating the reliability of performance measures across different data collection strategies15 or reported ? statistics to summarize reliability of measurements across different data sources.16 While differences in research protocols may favor one form of statistical comparison over another, general reporting standards could help ensure appropriateness and sufficient rigor to inform sound decision-making about the reliability of performance measures.

Administrative data will continue to be the “go-to†source for many implementers given the desire for readily available data from a large population of providers with minimal data collection burdens. Given the concerns raised by Pawlson et al and other researchers, where do we go from here?

First, all stakeholders must understand and acknowledge the limitations of current quality measurement involving administrative data; some measures based strictly on administrative data should not be used in particular programs. Second, more testing of quality indicators such as that contributed by Pawlson et al is necessary to better address the usefulness of performance measurement utilizing administrative data as well as other data sources. An effort to standardize testing and reports of test results would also be beneficial. Third, technology must also be leveraged to enhance quality measurement. EHRS hold great promise for performance measurement by virtue of expanding the amount of clinical data available, facilitating reporting, and providing feedback mechanisms for quality measurement. Measure developers have begun to work more closely with EHRS vendors, and the NQF is seeking to assess the reliability of data elements within EHRS for performance measures. These efforts are complementary and good places to start.

Authors’ Affiliation: From the American Medical Association, Chicago, IL (KSK, JC, SS).

Author Disclosure: KSK reports being the principal investigator for a grant funded by the Agency for Healthcare Research and Quality (Greg Pawlson, MD, is among the key personnel for that grant). KSK and SS report collaborating with the NCQA on a measurement development grant funded by the Centers for Medicare & Medicaid Services (SAS is the AMA project manager; KSK is the co-author of the proposal).

Authorship Information: Concept and design; drafting of the manuscript, and critical revision of the manuscript for important intellectual content (KSK, JC, SS).Address correspondence to: Karen S. Kmetik, PhD, American Medical Association, 515 N State St, Chicago, IL 60611. E-mail: karen.kmetik@ama-assn.org1. Auerbach AD, Landefeld CS, Shojania KG. The tension between needing to improve care and knowing how to do it. N Engl J Med. 2007;257:608-613.

administrative plus chart review data for reporting HEDIS® hybrid measures. Am JManag Care. 2007;13:553-558.

Available at: http://www.qualityforum.org/projects/ongoing/ambulatory/index.asp. Accessed September 17, 2007.

666-674.

6. Hofer TP, Hayward RA, Greenfield S, et al. The unreliability of individual physician “report cards” for assessing the costs and quality of care of a chronic disease. JAMA. 1999;281:2098-2105.

8. Current Procedural Terminology® Editorial Panel. CPT Process—How a Code

Accessed September 17, 2007.

10. Baker DW, Persell SD,Thompson JA, et al. Automated review of electronic health records to assess quality of care for outpatients with heart failure. Ann Intern Med. 2007;146:270-277.

record. Arch Intern Med. 2006;166:2272-2277.

13. Gold MR, Siegel JE, Russell LB, et al. Cost-Effectiveness in Health and Medicine. New York, NY: Oxford University Press; 1996.

15. Benin AL,Vitkauskas G,Thornquist E, et al. Validity of using an electronic medical record for assessing quality of care in an outpatient setting. Med Care. 2005;43:691-698.

record, and hybrid data sources for diabetes quality measures. Jt Comm J Qual Improv. 2002;28:555-565.