The Past as Prologue: Future Directions in Clinical Performance Measurement in Ambulatory Care

November 1, 2007
L. Gregory Pawlson, MD, MPH
L. Gregory Pawlson, MD, MPH

Volume 13, Issue 11

Physicians and other caregivers have articulated concerns about the completeness and focus of the current set of ambulatory care quality measures. In this commentary, we review some of the reasons why our current measures are not as useful, reliable, and accurate as they should be and examine the 2 major barriers to improvement: lack of adequate funding for formulation, development, and testing; and gaps in the completeness and adequacy of the current data sources. We explore some promising directions, including (1) measures of misuse and overuse, (2) measures of appropriateness and quality of procedures, (3) measures that are more directly actionable, (4) measures that are more nuanced and patient centered, and finally (5) measures that make full use of the added information in fully interoperable electronic health records. With modest private and public funding for comparative studies and consensus guideline development, formulation, and testing, and with the relatively rapid transformation taking place in current data systems, it is possible to move well beyond the rather limited, but still important, measures we currently have.

(Am J Manag Care. 2007;13:594-596)

A number of recent papers and editorials have expressed frustration with current clinical performance measures, especially those being used in the ambulatory care environment.1-3 Although this commentary provides a context and a rationale for the currently available ambulatory care measures, it also describes promising directions for the future of performance measurement.

The majority of the current clinical measures ascertain whether or not some clinical process has taken place or (less frequently) the proportion of patients who achieve a desirable physiologic or functional outcome. Although all of the HEDIS (Health effectiveness Data and Information Set, previously the Health Plan Employer Data and Information Set) health plan—level and physician-level measures have been extensively tested in terms of feasibility and accuracy, substantial literature exists only for the health plan measures (a recent PubMed search found 315 HEDIS health plan—related papers). At present, only a handful of physician office practice—level measures have been pilot-tested for actual use; fewer still have been subjected to an in-depth, large-scale analysis for accuracy and reliability. Moreover, problems abound in implementation of physician-level measurements, including accuracy of data, attribution, the degree of which an individual clinician or office practice influences a given measure's results, and the adequacy of the sample size (either actual or possible).

Concerns also have been raised that using currently available clinical measures distracts us from paying attention to key issues such as access and the so-called “art†of medicine.1 In spite of notable gaps, current measures do cover most major diseases and clinical areas, including preventive care, cancer screening, diabetes, asthma and chronic obstructive pulmonary disease, and cardiovascular disease. Although insufficient for improvement, performance measurement and feedback appear to be essential.4-6 Increasingly, sophisticated patient experience-of-care survey measures arguably capture some of what has been termed the art of medicine. Finally, requirements promulgated by the American Board of Medical Specialties for maintenance of certification programs, including engagement in quality improvement activities and passing a secure exam that tests clinical knowledge and reasoning, are important adjuncts to clinical performance measurement. Our inability to measure everything that is of reasonable importance is not a sufficient reason for failing to measure, report, and improve the substantial areas within clinical medicine for which we do have reasonably reliable and valid measures.

On the other hand, frustration about the current state of ambulatory care measurement is high, especially among the clinical, research, and other experts who serve on measurement, development, and review panels of the National Quality Forum, National Committee for Quality Assurance (NCQA), and the Physician Consortium for Performance Improvement. Every physician-level clinical performance measure represents a compromise between a more ideal measure (eg, the use of true outcomes like health status or mortality) and what is found to be feasible and possible given currently available data and the financial and human resources that are available to compile required data.

Barriers to Improvement

However, there are several promising new approaches to performance measurement. If these approaches are coupled with public funding and closer collaboration and cooperation among the health services research community, those who develop measures, and those who implement and use measures (notably clinical practices and electronic medical record vendors), the results would be transformational. These new approaches are described below.

Measures of Misuse and Overuse. Development and implementation of measures of misuse and overuse have been hampered by a lack of studies on the appropriate use of procedures or tests. In addition, it is difficult to define the point at which a sometimes-useful intervention ceases to be helpful.

Measures of Appropriateness and Quality of Procedures. Thirty years after the groundbreaking work of Brook and his colleagues at RAND, it is distressing to note that we still have only a few broadly implemented measures of appropriateness.9 Our inability to determine appropriateness has, in turn, retarded the creation of measures of the quality of procedures. If you cannot determine whether a procedure is appropriate, it is difficult to estimate relative quality. For example, if one practice does a large amount of inappropriate coronary artery bypass graft surgery on low-risk patients but with high technical proficiency, it becomes challenging to compare that practice's quality with the quality of another practice that does more appropriate surgery with somewhat lower overall technical proficiency. Although there are some notable recent efforts,10 we are far from where we need to be in this area, while spending nearly $2 trillion annually for healthcare.

Clustering and Weighting of Measures. Measures can be grouped into clinically and statistically related groups, and given weights based on health impact. Carefully done, grouping of measures provides broader and more reliable estimates of performance. Weighting of measures allows those with greater clinical impact to receive more attention in reporting or payment-related use. NCQA currently makes widespread use of clustering and weighting approaches in its ranking of health plans for accreditation, in report cards, and in the physician office practice recognition programs (diabetes, heart/stroke, and back pain). However, much remains to be done on both defining and testing these approaches.

More Directly Actionable Measures. Most current performance measures do not directly inform the practice or individual clinician as to what needs to be done to improve performance. Measures that might be more directly actionable are suggested by studies that look at how frequently clinicians made a change in regimen when confronted with a patient who was above a given threshold (eg, a patient whose blood pressure was 140/90 mm Hg).11,12

More Nuanced and Patient-centered Measures. Measures can be developed that take into account patient preferences, competing needs, and complex circumstances. Although the overuse of exclusions and exceptions that are not standardized and verifiable can make measures uninterpretable, most current measures do not adequately deal with patient-level variation. Risk adjustment of data as a means of overcoming issues related to noncontrollable heterogeneity also is far from ideal. Enhancement of how and what data are recorded and encoding data that can be directly analyzed would enable construction of measures related to documented and informed patient preferences or priorities, or to key information about comorbidities or other important factors.

Creation of Measures That Make Full Use of the Potential of Electronic Medical Records. Although important to do, simply encoding current clinical performance measures into electronic data environments will not suffice. We are far from understanding all that might be accomplished in a rich electronic data environment. However, a new generation of measures already look at change over time, actual laboratory test values (rather than threshold values), and actual time between critical tests (like mammography), as opposed to all-or-nothing thresholds.


As in many other areas of endeavor, one can view the glass as half empty or half full. Most individuals and organizations currently involved in development and deployment of measures understand the shortfalls and that critiquing current measurement systems can be a spur to move onward. That said, there appears to be far more agreement than controversy about what needs to be done to move forward.

2. Pogach LM,Tiwari A, Maney M, Rajan M, Miller DR, Aron D. Should mitigating comorbidities be included in assessing healthcare plan performance in achieving optimal glycemic control? Am J Manag Care. 2007;13:133-140.

4. Keife CI, Allison JJ, Williams OD, Person SD,Weaver MT,Weissman NW. Improving quality improvement using achievable benchmarks for physician feedback: a randomized controlled trial. JAMA. 2001;285:2871-2879.

6. Levin-Scherz J, DeVita N, Timble J. Impact of pay-for-performance contracts and network registry on diabetes and asthma HEDIS measures in an integrated delivery network. Med Care Res Rev. 2006;63 (1 suppl):14S-28S.

8. Tang PC, Ralson M, Arrigotti MF, Qureshi L, Graham J. Comparison of methodologies for calculating quality measures based on administrative data versus clinical data from an electronic health record system: implications for performance measures. J Am Med Inform Assoc. 2007;14:10-15. Epub 2006 Oct 26.

10. American College of Radiology; Society of Cardiovascular Computed Tomography; Society for Cardiovascular Magnetic Resonance; American Society of Nuclear Cardiology; North American Society for Cardiac Imaging; Society for Cardiovascular Angiography and Interventions; Society of Interventional Radiology. ACCF/ACR/SCCT/SCMR/ ASNC/NASCI/SIR 2006 appropriateness criteria for cardiac computed tomography and cardiac magnetic resonance imaging. A report of the American College of Cardiology Foundation Quality Strategic Directions Committee Appropriateness Criteria Working Group. J Am Coll Radiol. 2006;3:751-771.

12. Berlowitz DR, Ash AS, Glickman M, et al. Developing a quality measure for clinical inertia in diabetes care. Health Serv Res. 2005;40:1836-1853.