Practical Design and Implementation Considerations in Pay-for-performance Programs

The American Journal of Managed Care, February 2006, Volume 12, Issue 2

Pay-for-performance programs have grown substantially during the past 10 years. The number of programs has increased from just a few a decade ago to more than a hundred today.1 To date, the operating components of these programs, including quality measures, data collection, reporting, and the size and methodology of incentive payments, show broad variation. Most programs have been implemented in single communities or markets. However, a few notable exceptions–California, Hawaii, and Massachusetts–operate statewide.

the physician.

There has been a proliferation of standards, principles, and statements articulating recommendations for the design and composition of pay-for-performance programs. Physician organizations, such as the American Medical Association,2 American Academy of Family Physicians,3 and the American College of Cardiology4 have been active in this regard. Their efforts have been constructive, although not consistent, in advocating for the constituency most immediately affected by these programs:

As one might expect, conflicts in approaches to pay-for-performance among different stakeholders are not uncommon and reflect a variety of methodological and philosophical differences. This article deals with the product of a recent consensus-based process involving a cross section of physicians and medical managers. The result of this process, a thoughtful set of physician-oriented "perspectives," is sufficiently broad to offer easy comparison to existing programs, yet substantial enough to highlight the practical conflicts and realities experienced by those "in the trenches" who are designing and operating pay-for-performance programs.

designing and implementing performance measurement

programs that physicians deem valid and reliable.

This paper provides a subjective, potentially biased analysis of these physician perspectives on the California statewide pay-for-performance program sponsored by the Integrated Healthcare Association (IHA), which began operation in 2003. Since this program is the largest in the country, it offers a reasonable source for comparison.5 Following this analysis is a brief commentary on the practical difficulties of putting these perspectives into operation that highlights the essence of the challenge:

Consensus-based Perspectives and Practical Realities of Program Administration

The article by Forrest and colleagues6 that prompted this response outlines 6 components of pay-for-performance program design. The authors of the article provide a brief context for each component and make specific recommendations for implementation. As a means of comparison, Table 1 provides an overview of these recommendations and a subjective ranking (low, moderate, and high) of how the IHA-sponsored California pay-for-performance program complies with each recommendation. The purpose of this exercise is to gain a sense of how physician concerns intersect with the actual implementation of pay-for-performance programs.

Overall, the California program shares a substantial alignment with physician preferences, particularly as they relate to payment, transparency, metrics, and evaluation. This may be reflective of the membership of the consensus panel. Nonetheless, there were some conflicts. Most notable was the absence in the IHA-sponsored program of a mechanism to assure that patient preferences were integrated into the design, and the lack of a comprehensive risk-adjustment methodology.

Practical Difficulties in Program Implementation

Can pay-for-performance programs be designed to accommodate physician preferences, or do practical barriers make them unrealistic? Table 2 provides an assessment of the difficulties faced by program administrators attempting to implement physician preferences into pay-for-performance programs across a broad cross section of communities and markets. It includes a subjective assessment of the level of difficulty needed to implement each of the consensus recommendations.

This assessment highlights the relative difficulties involved. It also demonstrates the reality that the California program was not created by fiat, but through a prolonged effort by a large and dedicated number of stakeholders. Employers, health plans, physician organizations, and other stakeholders have succeeded in coordinating efforts to develop a common set of recommended measures and procedures. However, this was a monumental task that required strong leadership and the subordination of organizational self-interests to succeed. The bottom line: Collaboration involves a high level of difficulty and is time consuming, but it can be achieved.

Some aspects of program design appear to involve policy decision only, but in fact cannot be implemented without making a multitude of decisions and resolving a series of technical issues. Public disclosure is a good case in point. The initial decision to disclose results simply opens the door to a cascade of important questions about the level of reporting (individual or organizational), minimum sample size, relative or absolute rating (ie, number of stars vs quantitative scores), and a myriad of other considerations. The development of successful solutions in turn requires both analytical and communication expertise that should not be underestimated.

Variation in community dynamics presents its own unique challenges. How does a community with primarily individual practitioners report at an organizational level? How does a community without a history of interorganizational collaboration develop a common set of measures? These challenges highlight the need for new approaches and creativity in program design to accommodate the unique characteristics of communities.

Increasing the Validity of Performance Measurement

The consensus recommendations strongly advocate for the development of measures and methodologies that properly validate conclusions about how physicians and physician organizations are performing. From a practical standpoint, this can only occur if those being measured are intricately involved in the selection and design of measures and measure specifications, despite whether measurement occurs at an individual or group level. If the physicians that are being measured do not believe the conclusions are valid, are they? Perhaps, but the exercise is doomed without the proper engagement of those being measured.

This principle may seem self-evident, but there are potential barriers to its execution. Measuring across a community or market demands physician or physician group input across a spectrum of beliefs and values. Reaching consensus is a time-consuming and often arduous process. This challenge underscores the practical wisdom of implementing a limited number of measures at a program's onset.

Increasing the Reliability of Performance Measurement

The consensus recommendations discussed in the Forrest et al article6 argue that one advantage to reporting at the physician organizational level is to increase the likelihood of obtaining a sample size that can produce statistically meaningful results. In the case of the IHA-sponsored program, reporting at the physician group level, in combination with aggregating data for common measures across competing health plans, has dramatically enhanced the ability to report across multiple measures and physician organizations. The patient population in the IHA-sponsored program includes more than 6 million HMO members enrolled in 7 participating health plans. The results are collected by plans and physician organizations, subjected to an audit, and submitted to the National Committee for Quality Assurance, which aggregates the information into a single dataset. This dataset is then used for public reporting by the California Office of Patient Advocate, a state agency, and by the individual health plans for calculating incentive payments.

The power of aggregating data across health plans is a significant and important component of the IHA-sponsored program design. To highlight this point, consider that 3 of the participating health plans have less than 500 000 members each (small plans) and 4 of the participating health plans have more than a million members each (large plans). On average, a small participating health plan would only be able to report against all clinical measures in the program for 16% of its network physician groups; however, using the aggregated dataset it can report against all of the same measures for 70% of its network physician groups. Furthermore, on average, a large participating health plan would only be able to report against all clinical measures in the program for 30% of its network physician groups using its own data, but using the aggregated dataset it can report against all of the program's clinical measures for 65% of its network physician groups.


A comparison of consensus recommendations to the IHA-sponsored program demonstrates a close alignment between physician preferences and program design. To some degree this comparison may have been advantaged by the inclusion of medical managers in addition to physicians in the consensus process.

An element not considered in the consensus recommendations was the amount of pay (as a percentage of total pay or in absolute amounts) that should be included in the performance equation. The opportunity to earn meaningful incentives is an issue that will dominate the discussion in most communities considering pay for performance. Future recommendations should consider what amount of incentive is appropriate in the context of a relatively fixed budget.

The use of data aggregation will be another important component in pay-for-performance programs. Although many markets may not be characterized by large physician organizations or existing collaborations to collect data, these challenges can and should be overcome. Physician performance can be measured by geography or along lines of affiliation, such as by hospital. Efforts to collect data across payers solely for the purpose of quality measurement do exist and are feasible with the assistance of existing quality improvement organizations and other credible, neutral nonprofit organizations.

Nationwide, managed care experience in the 1990s demonstrated that without adequate physician (and consumer) input into the design of utilization practices and incentives that affect physicians, program sustainability is short-lived.7 Pay for performance offers great promise to catalyze quality improvements and better align financial incentives; accordingly, an open dialogue with physicians about the best design for these programs is essential. However, this must be done with consideration for the current state of measurement science and its practical challenges. These challenges should not be used as a shield to prevent progress; rather, they should be understood as obstacles that can be overcome through collaboration, technological advances, and improved measurement science.


Thanks to Lauren Lempert, JD, and Dolores Yanagihara, MPH, staff members of the Integrated Healthcare Association, for editorial and technical assistance.

From the Integrated Healthcare Association, Oakland, Calif.

Address correspondence to: Thomas R. Williams, MBA, MPH, Executive Director, Integrated Healthcare Association, 344 Thomas L. Berkeley Way, Suite 350, Oakland, CA 94612. E-mail: