Login | Register
HCPLIVE
AJMC
AJPB
PHARMACYTIMES
PHARMACY & THERAPEUTICS SOCIETY
Volume 14: 833-838     December 2008     Number 12
Benchmarking Physician Performance: Reliability of Individual and Composite Measures
Sarah Hudson Scholle, MPH, DrPH; Joachim Roski, PhD, MPH; John L. Adams, PhD; Daniel L. Dunn, PhD; Eve A. Kerr, MD, MPH; Donna Pillittere Dugan, MS; and Roxanne E. Jensen, BA

Objective: To examine the reliability of quality measures to assess physician performance, which are increasingly used as the basis for quality improvement efforts, contracting decisions, and financial incentives, despite concerns about the methodological challenges.

Study Design: Evaluation of health plan administrative claims and enrollment data.

Methods: The study used administrative data from 9 health plans representing more than 11 million patients. The number of quality events (patients eligible for a quality measure), mean performance, and reliability estimates were calculated for 27 quality measures. Composite scores for preventive, chronic, acute, and overall care were calculated as the weighted mean of the standardized scores. Reliability was estimated by calculating the physician-to-physician variance divided by the sum of the physician-to-physician variance plus the measurement variance, and 0.70 was considered adequate.

Results: Ten quality measures had reliability estimates above 0.70 at a minimum of 50 quality events. For other quality measures, reliability was low even when physicians had 50 quality events. The largest proportion of physicians who could be reliably evaluated on a single quality measure was 8% for colorectal cancer screening and 2% for nephropathy screening among patients with diabetes mellitus. More physicians could be reliably evaluated using composite scores =17% for preventive care, >7% for chronic care, and 15%-20% for an overall composite).

Conclusions: In typical health plan administrative data, most physicians do not have adequate numbers of quality events to support reliable quality measurement. The reliability of quality measures should be taken into account when quality information is used for public reporting and accountability. Efforts to improve data available for physician profiling are also needed.

(Am J Manag Care. 2008;14(12):829-838)

Related Articles
Measuring physician performance is becoming commonplace as health plans and purchasers look for ways to drive quality improvement and to increase physicians’ accountability and rewards for achieving quality goals. A recent study1 reported that, among 89% of health maintenance organization plans using physicianoriented pay-for-performance programs, more than one-third measured and rewarded quality at the individual physician level. In addition, public and private purchasers are demanding more information about America’s physicians and hospitals to aid in value-based purchasing and selection of health plans and providers.2

However, concerns remain regarding the validity and reliability of such physician performance profiles. Several factors are needed to support fair and accurate comparisons among physicians. These include evidence-based quality measures, complete and accurate data sources, and standardized methods of data collection. Physician-level reliability of a quality measure is another key consideration in this measurement. Physician-level reliability refers to the ability of a quality measure to distinguish an individual physician’s performance from the performance of physicians overall. Good physician-level reliability requires the following 2 factors: (1) a sufficient number of patients eligible for a given quality measure and (2) performance variation across physicians on that quality measure.3-5 The greater the number of a physician’s patients who are eligible for a quality measure, the more precise the estimate of the physician’s performance. When performance variation for a given quality measure across physicians is limited, the likelihood that a physician’s performance is statistically significantly different from that of his or her peers is also decreased. Hofer and colleagues6 showed that not controlling for a quality measure’s physician-level reliability significantly misrepresented performance differences across physicians. However, adjusting performance profiles in such a manner is not commonplace across the healthcare industry.

Ensuring that measurement results are valid and reliable is important when purchasers and plans (and potentially consumers) use the data to make decisions about which physicians get financial rewards or other benefits. The stakes are particularly high when profiling results are used for public reporting or eligibility for participation in a health plan network. Paying attention to the validity and reliability of data will help to ensure that these decisions are based on real differences in performance among physicians rather than any shortcomings of the measurement.

Although performance results based on limited sample sizes could be adjusted for the reliability of individual measures, 7-9 the creation of composite scores may also be a useful way to increase the reliability of physicians’ performance scores.10 Little is known about the extent to which constructing composite scores mitigates the limitations of sample size and reliability, while continuing to provide useful and understandable information.11

To date, there have been few reports regarding the reliability of physician-level performance scores associated with commonly used practices and methods in the healthcare industry. To begin to address this deficiency, this study relied on a large data set that combined patient-level administrative data from 9 large health plans to compute performance for primary care physicians (PCPs) using 27 commonly measured quality indicators. This data set is typical of data sources often used by individual health plans to profile physician performance. Specifically, we examined for each quality measure and composite score the proportion of PCPs who could be evaluated given different minimum sample size criteria and the physician-level reliability under those minimum sample size criteria. Our primary research questions were the following: (1) What is the physician-level reliability of commonly used performance measures calculated exclusively based on administrative data? (2) Can more physicians be reliably evaluated using a composite score?

METHODS

Data Sources

This study used administrative data from the Ingenix Impact Pro database.12 Deidentified claims and enrollment data for individuals enrolled in 9 health plans from 9 separate geographic regions for 2003 and 2004 were available for this study. Each of these plans had at least 250,000 members and accounted for 15% to 50% of managed care enrollees in their markets (Table 1). In all, these plans covered more than 11 million unique members and many physicians and employer groups. The members included in these organizations were primarily enrolled in commercial health maintenance organization, preferred provider organization, and point-of-service health plan product designs, with fewer individuals enrolled in Medicare risk products. Pharmacy benefit status, an indicator of the general availability of pharmacy data to support measurement, ranged from 51% to 80% of the enrolled populations for each plan. Although the study population was drawn from multiple geographic census regions, most individuals were located in the northeast United States. The data were deidentified to protect patient, physician, and organization confidentiality. This study was reviewed and determined to be exempt by Chesapeake Research Review, Inc (Columbia, MD).

Because the Impact Pro database may not include complete data on all services (eg, pharmacy, laboratory, or mental health services) needed for calculating some performance measures, we conducted specific analyses to assess the completeness of the data available for the study. Using only administrative data sources, we compared performance rates based on Impact Pro data with performance data reported to the National Committee for Quality Assurance (NCQA) through the Healthcare Effectiveness Data and Information Set (HEDIS) reporting. If we found more than a 5–percentage point difference between the plan’s reported rate to the NCQA and the rate in the Impact Pro database, the data were excluded for that quality measure.

Selection of Quality Measures


American Journal of Managed Care
American Journal of Pharmacy Benefits
HCPLive
ONCLive
OTCGuide
PainLive
Pharmacy Times
Physician's Money Digest
About Us
Contact Us
Advertise
Terms & Conditions
Privacy Policy
Newsroom
iPad & iPhone
Social Network
Intellisphere, LLC
666 Plainsboro Road
Building 300
Plainsboro, NJ 08536
P: 609-716-7777
F: 609-716-4747

Copyright HCPLive 2006-2011
Intellisphere, LLC. All Rights Reserved.
 

 

eNewsletter Sign Up


Enter your e-mail address below to receive an electronic version of AJMC's table of contents.

*First Name

*Last Name
*Company/organization

*Job title
*E-mail:





Become a Member
Forgot Password?
Please sign in and click the icon to request the PDF be sent to your e-mail address. Thank you.





Become a Member
Forgot Password?
Please sign in and click the icon to request the PDF be sent to your e-mail address. Thank you.





Become a Member
Forgot Password?