Physician-level P4P—DOA? Can Quality-based Payment Be Resuscitated?

May 1, 2007
Laurence F. McMahon Jr, MD, MPH
Laurence F. McMahon Jr, MD, MPH

Timothy P. Hofer, MD
Timothy P. Hofer, MD

Rodney A. Hayward, MD
Rodney A. Hayward, MD

Volume 13, Issue 5

Unlike many areas of the economy where value is relatively easy to measure and reward, healthcare is "messy." Patients bring both clinical heterogeneity and illness-severity complexities to the interchange with their physician. The measurable outcomes or process measures are as likely to be due to patient characteristics as they are to be due to the actions (or inactions) of the patient's provider. Moreover, data suggest that the simplest fix for providers with bad metrics is to "dump" their sickest patients. Perhaps the most pernicious consequence of physician-level pay-for-performance (P4P) systems is how these systems can affect the neediest patients and their providers. As patient characteristics (eg, illness severity, preferences, resources) are more likely to be an issue in our poorer and minority communities, these patients' physicians will be at a financial disadvantage in a P4P system. It is likely that the widespread adoption of P4P systems will further limit these necessary resources.

(Am J Manag Care. 2007;13:233-236)

The current enthusiasm for pay for performance (P4P) is not informed by knowledge of the complexities of clinical practice or by the experience of other physician-level "report cards."

  • Much of what is attributed to clinicians in report cards has more to do with the complexity of their patients than the quality of their practice. As such, the likely consequence of aggressive P4P schemes will be for physicians to minimize their mix of complex patients.
  • As underserved areas have more patients with medically and socially complex care needs, their physicians are likely to score less well in P4P schemes and receive less payment than other physicians with less challenging patients.

The current enthusiasm for physician-level pay for performance (P4P) is misplaced. Most proposed schemes are not based on sound business principles, and the accumulated scientific evidence suggests that these efforts, as currently constituted, may well result in more harm than benefit. Why is such a seemingly simple concept—the linking of provider-level financial incentives to patient outcomes—so wrongheaded in application? The answer lies not in the concept but in its implementation. Proponents fail to recognize the complexities and overlapping responsibilities in the interactions of individual patients and their physicians, the deficiencies of the proposed measurement strategies, the current lack of infrastructure support in healthcare to implement more constructive measurement strategies, and the failure to use the lessons already learned about profiling and performance measurement to inform proposed P4P systems.

The seductiveness of the P4P paradigm stems both from its simplicity and from the market-based construct on which it is based. It appeals to a basic, widely held principle in our society that one should be rewarded for doing better than others—this precept is operative everywhere from sports to Wall Street. Because healthcare traditionally has had fewer elements of this market paradigm than other segments of our society, and because healthcare has highly publicized issues of quality, accessibility, and cost, it is natural to propose that a little "market discipline" would move the healthcare system in the right direction. Proponents of P4P argue that physicians who function at a higher level on the healthcare "quality gradient" should be rewarded more. After all, higher-quality products—be they stereos, automobiles, detergents, or toothpaste—often warrant a higher price. Why not apply this quality gradient to healthcare?

Although market-based approaches may well have an important role to play in the healthcare system, significant practical issues render current application of physician-level P4P both wasteful and hazardous. A market-based approach works best when we can reasonably measure the relative value of "products" (in this case, the improvement in outcomes attributable to the actions of individual physicians) and when the rewards for better performance are fair and motivate continued improvement.

The great irony is that currently proposed strategies attempt to promote high-quality, efficient, patient-centered care by using performance measures that account for a trivial proportion of what's important in the exchange between doctors and patients. Available studies suggest that currently proposed schemes are likely to primarily measure patient illness severity (not the quality of physician care), and rarely to consider the importance of achieving the quality standard or how often the costs and patient burden needed to achieve the quality standard are prohibitive. Although it is sometimes possible to obtain reasonable information on true "value," a fragmented healthcare delivery system obstructs access to information on quality, and we thus far have been unwilling to invest in the infrastructure or pay the data collection costs that would be needed to produce a viable or rational P4P strategy.

Proponents counter that the perfect is the enemy of the good and that we must start somewhere. However, it is well appreciated in behavioral economic theory that limited information on value, even if accurate, can lead to worse decision-making.1,2 And in this case, the harm is not solely monetary; the public can be physically harmed as well. The clinical heterogeneity among those who seek healthcare is a critical feature that distinguishes the complexity of healthcare from the relative simplicity of factories. In healthcare, the "inputs and outputs" are not static and reproducible items like the "widgets" of classical economics texts. The inputs and outputs are ill people, unique individuals, each of whom possesses unique socioeconomic means, dynamically changing and interacting diseases, and individual health beliefs and preferences that interact with the knowledge and skill of their physician to yield the observed behaviors and outcomes that serve as the metrics in the physician-level P4P schemes.

Consequently, before endorsing widespread adoption of P4P strategies, several challenging questions must be asked and answered. First, how well do the proposed quality measures assess clinically meaningful quality of care? Is it possible to determine from these measures whether physicians respond appropriately to complex clinical situations? To what degree do these measures give us information about the quality of care provided, as opposed to clinically appropriate deviations from ideal practice? (Such deviations in practice could be based on the specific clinical situation, patient social problems that place achieving the goal beyond the physician's ability, or well-informed patient preferences that makes achieving the goal inappropriate.) Finally, we are obligated to find out whether profiling using inaccurate or low-importance information can produce unintended consequences that may actually lower overall quality of care.

The unintended consequences can take several forms. Hofer et al demonstrated that patient characteristics were determining factors in a physician's "report card" for diabetes. They note that "use of individual physician profiles may foster an environment in which physicians can most easily avoid being penalized by avoiding or deselecting patients with higher prior costs, poor adherence or response to treatments."3 But would physicians "deselect" or avoid patients whose disease is difficult to control?

An answer to this question has been obtained in the context of New York State, where the New York State Department of Health issues mortality reports, at the provider level, for coronary artery disease interventions. In a recent study, Narins and colleagues note that "the vast majority (79%) of interventional cardiologists agreed or strongly agreed that the publication of mortality statistics has, in certain instances, influenced their decisions regarding whether to perform angioplasty on individual patients. Physicians expressed an increased reluctance to intervene in critically ill patients with higher expected mortality rates."4 New York Magazine recently published a similar note of concern in an article titled "Heartless. To Manipulate Their Crucial Personal-Fatality Ratings, New York Heart Surgeons Are Turning Away Needy Patients."5 But given that New York State reports profiles that are adjusted for comorbidity and disease severity, shouldn't such adjustments remove the incentives to avoid sicker patients?

In fact, the Holy Grail of health services research over the past 30 years has been a severity adjustment system that would adequately account for the clinical differences among patients and could be used to adjust payment, quality, resource allocation, strategic planning, and so forth. Unfortunately, this system remains a work in progress. In perhaps the most controlled setting, medical intensive-care units, making use of all the diagnostic and clinical data available (much more data than could conceivably be available on a routine basis for outpatient or hospitalized patients), Rosenberg et al found that the origin of the patient (transferred in or not) remained a critical variable in forecasting the patient's resource use and mortality after all the clinical severity adjustments had been made.6 In other words, the severity-adjustment system was unable to account for critical clinical differences among patients. Physicians in New York seem to agree with this finding, as 85% of the interventional cardiologists "believed that the risk adjustment model used in the percutaneous coronary interventions (PCI) in New York State 1998-2000 report is not sufficient to avoid punishing physicians who perform higher risk interventions."4

Although this trail of evidence raises concern about physician-level P4P causing physicians to change who they see or treat, it may have even wider implications for access to care—not just for individuals but for entire populations. On a population basis, we know that individuals with differing socioeconomic statuses, races, classes, and disease burdens are not randomly distributed in a region, state, or city. We know that those with lower socioeconomic status, for example, are likely to live in specific locations, and are accordingly likely to be treated by specific hospitals and physicians. They also are more likely to carry a heavier disease burden with associated lower average outcomes. In a P4P scenario where bonuses are paid to higher performers and, accordingly, less is paid to others (no P4P plan in this country is designed to increase total expenditures to pay for these bonuses), what happens to those providers with a disproportionate number of high-risk patients?7 They can "dump" their patients, they can get paid less, or they can move. Physicians in indigent communities already face the burden of caring for sicker patients with poorer insurance coverage. Now, government and private insurers are proposing to create a market that would further penalize those who care for the most vulnerable subset of our population, adding to the major difficulties that these communities already face in recruiting and retaining high-quality physicians.

Proponents of P4P further suggest that performance information would give patients an incentive to seek high-quality providers, resulting in improved outcomes. This suggestion ignores 2 facts. First, other providers would be reluctant to take on the risk burden (as well as the financial burden) of these patients, and perhaps more importantly, many of these patients lack the resources to move to another provider. One has only to revisit the lower 9th ward of New Orleans to realize that simple solutions like "move to higher ground," when confronted with the reality of the resources available, do nothing to avert tragedy. If people are unable to respond to real life-and-death situations, how can they be expected to respond to more abstract health risks? However, one group of people have the knowledge and means to respond: the physicians. Physicians with income at risk would move to maintain their income, further limiting access in already underserved communities. Again, exploring the objective lessons from the New York coronary artery bypass graft (CABG) report card, Werner et al note that "the release of the CABG report cards in New York was associated with a widening of the disparity on CABG use between white versus black and Hispanic patients."8

Given this pessimistic assessment of the current state of evidence for physician-level P4P schemes, should we abandon the movement toward more value-based healthcare, supported by quality-based payment schemes? Of course not. We need, however, to reject the quick fix and look to segments of the healthcare system with demonstrated success in applying quality- based systems and payments structures. As others have demonstrated, the most successful example of this transformation to quality-based healthcare delivery comes not from leading private or academic centers of excellence, but from the federal government—the Department of Veterans Affairs (VA) healthcare system.9

What are the critical success factors in the remarkable transformation of the VA healthcare system? First the VA, as an organization, adopted value management. It developed quality-based information that could be used to redirect and focus clinical and organizational effort to improve patient care processes and outcomes. The VA has not "targeted" individual physicians, recognizing, as we highlighted above, that at the level of the individual provider, the variability in observed practice due to small patient panels with heterogeneous patient-level characteristics is usually sufficient to overwhelm the impact of the individual provider on measured process and outcomes. As one "rolls-up" patients into an institutional practice profile, measures are more stable. Perhaps more importantly, the management focus at this higher level is also more clinically meaningful, because as the Institute of Medicine notes in Crossing the Quality Chasm, most quality problems are mainly system problems, not deficiencies of individual providers.10

The second key success factor is the availability of clinical information that can inform, remind, and assist providers in the care of their patients. This information accessibility and reminder system are even more critical for those patients who are most sick. A third critical success factor in the VA is recursive feedback of the quality performance information (based on a fairly extensive set of quality measures) to both providers and managers. The joint use of the quality information for physicians and managers forces all partners in the clinical care process to work in consort to address any deficiencies, and minimizes the "we/they" dynamic.

Finally, and perhaps most important, the VA invests in a clinically detailed, multifaceted performance evaluation system with accountability being placed on those managers who have the resources necessary for system change. This is not to suggest that physician-level profiling should never be attempted, but only to suggest that we currently lack measures that adequately reflect the physician impact on quality of care. Furthermore, the costs and impact of such profiling may make it a much less pressing task than increasing the accuracy and detail at the health plan level nationwide, which are currently superficial and narrow compared with the accuracy and detail obtained within the VA healthcare system. Unfortunately, even the VA is now succumbing to the public pressure of proponents who argue against all evidence and common sense that P4P must be pushed down to the level of the physician.

In summary, the physician-level P4P scheme is promoted in the face of the more cautious approach suggested by both business and economic theory and extensive empirical evidence produced over the past couple of decades. We suggest that the lessons learned from the VA healthcare system can be exploited to bring more value to the non-VA sector. That will require a shift from individual to group incentives, it will require the alignment of incentives for both managers and clinicians, and finally it will require an investment in developing better information systems for managing and monitoring patient care, and the willingness to invest more resources in more clinically detailed and more clinically relevant quality measurement. We understand the impatience of payers and policy makers who note that we have major problems with cost and quality today, but leaders in business understand that strategic investment and organizational reorganization or consolidation often are essential steps in developing a successful business plan. It is necessary to resist the temptation of the quick-fix, next-quarter mentality, but rather to create and support systems that lead to sustainable and important improvements in the quality and value of care.


We thank Eve Kerr, MD, MPH, for her helpful comments on earlier drafts of this manuscript.

Author Affiliations: From the Division of General Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor (LFM, TPH, RAH) and the Ann Arbor Veterans Affairs Health Research & Development Service, Ann Arbor, Mich (TPH, RAH).

Correspondence Author: Laurence F. McMahon, Jr, MD, MPH, Division of General Medicine, 300 North Ingalls, Rm NI7C27, Box 0429, Ann Arbor, MI 48109-0429. E-mail:

Author Disclosure: The authors (LFM, TPH, RAH) report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter discussed in this manuscript.

Authorship Information: Concept and design (LFM, TPH, RAH); analysis and interpretation of data (LFM, RAH); drafting of the manuscript (LFM, TPH); critical revision of the manuscript for important intellectual content (LFM, TPH, RAH); obtaining funding (LFM); administrative, technical, or logical support and supervision (LFM).

1. Kahneman D, Slovic P,Tversky A, eds. Judgment under Uncertainty: Heuristics and Biases. Cambridge, UK: Cambridge University Press;1982.

2. Green M, Low D.The psychology of warnings: Warning! Safety signs may be ineffective! Occupational Health and Safety Canada. October/November 2001;30-38.

3. Hofer TP, Hayward RA, Greenfield S,Wagner EH, Kaplan SH, Manning WG.The unreliability of individual physician "report cards" for assessing cost and quality of care of a chronic disease. JAMA. 1999;281:2098-2105.

4. Narins CR, Dozier AM, Ling FS, Zareba W.The influence of public reporting of outcome data on medical decision making by physicians. Arch Intern Med. 2005;165:83-87.

5. Kolker R. Heartless. To manipulate their crucial personal-fatality ratings, New York heart surgeons are turning away needy patients. New York Magazine. October 24, 2005.

6. Rosenberg AL, Hofer TP, Strachan C,Watts CM, Hayward RA. Accepting critically ill transfer patients: adverse effect on a referral center's outcome and benchmark measures. Ann Intern Med. 2003;138:882-890.

7. Epstein AM. Paying for performance in the United States and abroad. N Engl J Med. 2006;355:406-408.

8.Werner RM, Asch DA, Polsky D. Racial profiling. The unintended consequences of coronary artery bypass graft report cards. Circulation. 2005;111:1257-1263.

9. Perlin JB, Kolodner RM, Roswell RH.The Veterans Health Administration: quality, value, accountability, and information as transforming strategies for patient-centered care. Am J Manag Care. 2004;10(part 2):828-836.

10. Committee on Quality of Health Care in America, Institute of Medicine. Crossing the Quality Chasm: A New Health System for the 21st Century.Washington, DC: Institute of Medicine; 2001.