Producing Public Reports of Physician Quality at the Community Level: The Aligning Forces for Quality Initiative Experience
Jon B. Christianson, PhD; Karen M. Volmar, JD, MPH; Bethany W. Shaw, MHA; and Dennis P. Scanlon, PhD
The public reporting of physician performance has been supported by the Bush and Obama administrations as an important element of healthcare reform.1,2 Since 2006, the Robert Wood Johnson Foundation (RWJF) has encouraged public reporting efforts by community healthcare coalitions (ie, alliances, the multi-stakeholder partnerships receiving funding through the Foundation’s Aligning Forces for Quality [AF4Q] initiative) chosen to participate in its AF4Q initiative3 as one way to address deficiencies in quality of care.4 The value of locally produced versus national-level physician performance reports is based on several premises: (1) local efforts are more effective in reporting physician performance measures salient to the community and add to the information available to community residents when selecting providers; (2) physicians view the results as more credible because they have played a role in report development, leading them to engage in more effective, targeted quality improvement efforts; and (3) locally developed reports receive more local media attention, enhancing visibility and credibility with consumers and increasing the likelihood that they will use the information in making healthcare decisions.
Nevertheless, there is reason to be skeptical that coalitions of diverse stakeholders can produce credible physician performance reports, as previous local, voluntary collaborative efforts to implement health system change strategies have had mixed success.5-7 Also, because data submission by providers or health plans is voluntary, the process of producing credible comparative performance reports is fraught with political and technical challenges for even well-funded, historically effective community healthcare coalitions.
With the exception of recent work by Young,8 relatively little has been written about the community-based reporting process and whether resulting reports add significantly to the amount and relevance of physician performance information available to consumers. This article contrasts the approaches that different AF4Q alliances have taken in producing community-based reports with clinical quality and patient experience measures, and assesses the contribution of these reports to existing physician performance information.
The AF4Q initiative provides guidance, technical assistance, and funding to community coalitions, in the hope that this will accelerate their provision of physician performance information, which will change consumer and provider behaviors, thereby improving healthcare quality (see eAppendix A at www.ajmc.com, AF4Q Initiative: Public Reporting Logic Model). When the AF4Q initiative began, only 3 of the initial 14 AF4Q alliances were publicly reporting ambulatory quality data. Nationally, there were relatively few local or state-level efforts to measure and report physician performance. Upon joining the AF4Q initiative, alliances were given a goal to publicly report ambulatory quality measures for over 50% of primary care providers in their communities within 3 years; it was implied that funding could be discontinued if this target was not met.9 The AF4Q initiative provided technical assistance, including webinars on the selection and construction of ambulatory quality of care measures. Later, the AF4Q initiative required the alliances to add patient experience measures to their physician performance data, along with expanding the scope of the reports to include hospital quality and provider efficiency measures. But, due to the urgency attached to the early reporting of physician performance data, and the fact that alliances had more experience in this area than in hospital reporting, this article focuses only on reporting of physician performance measures by the 14 original alliances (data from additional alliances that joined in 2009-2010 are not included because of their limited experience as AF4Q participants).
Some alliance leaders believed that the AF4Q initiative placed disproportionate emphasis on achieving its early public reporting target relative to the attention given to the program’s other core areas—quality improvement and consumer engagement—and that its reporting timeline was overly ambitious. Representing this viewpoint, one alliance leader said, “…the real push is this public reporting piece which is the endgame for AF4Q.” Additionally, some alliance leaders did not share the AF4Q initiative’s view of the potential value of physician performance reports. Their skepticism was expressed in comments such as: “You want to engage consumers, but…with the quality data, it’s interesting and it’s sexy but ‘what the hell are they [consumers] supposed to do with it?’”; and, “…I think the original belief was that all you need to do is report, and things will magically get better …” On the other hand, many alliance leaders felt that the AF4Q initiative’s ambitious reporting target helped move stakeholders from general support for the alliance and its mission to specific actions. In the end, all but 1 alliance was successful in disseminating a public report by the 3-year target date covering at least 50% of primary care physicians in the community (Figure); the reporting efforts of the sole unsuccessful alliance were delayed by state-level legal issues related to uses of health plan data. The sections that follow assess how alliances produced their reports and the contributions of the reports to the existing physician performance information in their communities.
The analyses are based on 2 data sources: semi-structured in-person and telephone interviews that provided information about various aspects of physician performance reporting, such as goals, strategies, and processes for measure construction,10 and the ongoing tracking of contents of alliance reports and the reports produced by other organizations in alliance communities. Interview responses were transcribed from audio recordings, and text files were read and coded. Codes related to performance measurement and public reporting included topics such as challenges/barriers to measurement and reporting; clinical quality and patient experience measurement; data aggregation; and data collection. The coded text was entered into Atlas.ti, a software package for qualitative analysis. These data were used primarily in assessing the processes used by AF4Q alliances to construct reports of physician performance.
Data were collected on the presence and content of public reports in areas served by the 14 original alliances. Without knowledge of their public reporting history, 7 areas that were similar to AF4Q communities in location, population size, and demographics also were selected. Each year, beginning in 2007, we reviewed the websites of hospitals and medical associations, healthcare coalitions, quality improvement organizations, state departments of health, and the AF4Q alliances to document public reporting activities. In addition, websites for the 5 largest commercial health plans, which included national plans operating in the AF4Q communities, were examined. In communities where there were fewer than 5 significant plans, websites for plans with membership that together constituted approximately 75% or more of the total private sector health plan enrollment in the area were reviewed. In all (AF4Q and other) communities, we collected information from organizations that sponsor public reports to verify the search findings and gather further details regarding measure sources and construction. We used this information to determine if alliance public reporting efforts contributed to the type and amount of physician performance information available to consumers in AF4Q communities, how the availability of this information compared with other communities, and if alliance reporting changed over time in ways consistent with AF4Q initiative expectations.
Results Producing Public Reports
Alliance report production occurred in several stages, including initiation, measure selection, measure construction, and dissemination (see eAppendix B, Alliance Public Reporting Process). In this article, we examine the AF4Q initiative efforts in selecting and constructing physician performance measures (dissemination activities are addressed by Mittler et al in this supplement11). For convenience, we discuss these 2 stages separately, but they overlap at points. For instance, while alliances chose measures in areas where care deficiencies have been documented by national studies, the selection of specific measures was guided by early judgments about what types of data were likely to be available for measure construction. In the Table, we summarize the physician performance measures selected by the alliances and the results of 2 key decisions made in constructing those measures.
Prior to the AF4Q initiative, 8 alliances did not have communitywide processes in place to select and construct ambulatory care quality measures. After joining the AF4Q initiative, all 14 original alliances developed these processes, and all met AF4Q initiative expectations that their reports include nationally endorsed measures. For alliances that were not previously reporting physician performance, AF4Q initiative participation was seen as critical to measure development. One respondent observed that the AF4Q initiative gave the alliance “…the ability to create the infrastructure to bring the physicians together, to create the agreement upon the measures, and to actually get them up there [measures reported]. That would never have happened without Aligning Forces.”
All alliances considered obtaining physician “buy-in” to reporting an essential first step and a necessary precursor to measure selection. The alliances went to great lengths to garner physician support. One alliance leader observed that “…we have been to every venue we could possibly think of in the last 6 months; talking to physician groups…trying to engage them about the measures and do these make sense, do they not and trying to explain to them about the rationale about picking the measures…” The alliances typically established physician-led work groups charged with recommending measures in different areas. The measure recommendations were then reviewed by steering committees that had broader representation. Prior to making final selections, some alliances distributed proposed measure specifications communitywide for feedback, and then made further modifications if necessary.
The AF4Q initiative’s initial focus was on reporting ambulatory quality measures for the treatment of chronic illnesses; these measures, especially on diabetes-related care, dominated early reports (Table). Alliances readily accepted the AF4Q initiative’s direction to use nationally endorsed measures; it was easier to muster physician support for them and, given the reporting target date, alliances did not have the time or resources to develop new measures. Consequently, most alliances relied on National Quality Forum–endorsed chronic care measures and/or those produced by the National Committee for Quality Assurance. Securing stakeholder agreement around patient experience measures was more problematic. While hospital patient experience measures have been in use for some time, ambulatory care patient experience measures were less familiar to stakeholders. As one alliance leader observed, “…patient experience is a huge, vast gray area for us,” while another noted that these measures were “…politically a very difficult sell for physicians” compared with clinical quality measures. A further complication was that national health plans had their own measures of patient experience and were often not willing to engage with, or provide support for, an alliance process that could result in selection of different measures, usually coming from specific surveys. Alliances that did find a way to report patient experience used nationally endorsed clinician and group Consumer Assessment of Healthcare Providers and Systems measures. Two alliances participated in a patient experience pilot program with Consumers’ Checkbook, and all alliances attended a meeting in which options for patient experience measurement were discussed. For many alliances, the cost of collecting patient experience data proved to be a significant barrier.
For alliances new to reporting physician performance, the measure selection process typically took longer than expected, in part because it was enmeshed with early alliance efforts to build credibility and support in their communities. One alliance leader reported that the alliance “…had to be very deliberate in our selection of what our methodology was going to be and it had to be data that the physician could not just believe in but it had to be a program that the physicians could drive and own,” which meant developing guiding principles for measure selection and “a methodology that is explicit and open to scrutiny.” In summary, the measure selection and specification process often was the first consequential act undertaken by alliances under the auspices of the AF4Q initiative; they approached it cautiously, expecting that it could establish or destroy their credibility with community stakeholders.
The main decision regarding construction of physician performance measures was whether to use administrative (ie, claims) data or data from medical records (Table). Initially, despite physician distrust of the accuracy and completeness of claims data, most alliances chose to use these data to construct their measures. The AF4Q initiative funds typically were used to produce claims-based measures. Alliances believed this would be the quickest path to public reporting, as the data were already available and being used by commercial health plans to produce performance measures that were available to their members. (Four alliances using claims data also were successful in incorporating data for patients covered by Medicaid.) To construct claims-based measures, alliances contracted with data aggregators. These firms obtained the claims data from participating health plans, corrected and standardized the data, attributed patients to individual physicians or physician practices across the merged data set, and constructed measures according to alliance specifications. Typically, the first time measures were constructed using this process, and measure values were reviewed by physician practices and then revised based on physician feedback. In subsequent reports, physicians were given a time period in which to review results (sometimes mandated by state law) before they were released to the public. While alliances anticipated that using claims data would accelerate the public reporting process, for most alliances, this proved not to be the case. Typically, it took time to convince plans to participate, and not all obliged. Once plans agreed, drafting legal agreements for data sharing and confidentiality also proved to be time consuming. In addition, plans submitted data in various ways, and the data did not often meet measure production standards. Finally, after receiving plan data, some aggregators took longer than expected to construct the measures.
Another measure construction option was to use data from paper or electronic medical records; 2 alliances used variants of this process prior to joining the AF4Q initiative. Physician practices provided clinical data from a random sample of medical charts or for a population of appropriately identified patients drawn from a patient registry (often computer-based). Some alliances expressed concern that physicians would reject either approach as too burdensome. In practice, while it did impose costs to practices, physicians agreed to clinical data submission, believing measures would better reflect the quality of care in their practices because they would be constructed using data from the entire patient population. Using data from patient records rather than claims minimized technical issues around attribution of patients to physicians; allowed reporting of biologic markers, such as low-density lipoprotein cholesterol levels, not possible using claims data; and facilitated reporting at a physician practice level, as opposed to a larger medical group level, due to a greater number of observations available for measure construction. For many alliances, this seemed like the appropriate level of reporting, as it coincided with the level at which quality improvement efforts were likely to be implemented. However, it was not necessarily faster, initially, than producing reports using claims data. The alliances using this approach did not attempt to construct measures at the individual physician level, in part because physicians opposed doing so.
As with claims-based measure construction, building the infrastructure to support measure construction using clinical data was arduous. It required careful specification of procedures for sampling patients and identifying eligible patients based on measure guidelines. Alliances established portals to receive physician data and visited physician practices to audit submissions. Alliances that had never constructed clinical performance measures typically adopted the policies of other alliances. One experienced alliance even configured its portal to accept physician data submissions from practices of another alliance.
Irrespective of the approach, when several alliances realized that they wouldn’t produce their first physician performance report by the the AF4Q initiative target date, they turned to the Centers for Medicare & Medicaid Services’ GEM (Generating Medicare Physician Quality Performance Measurement Results) data to construct a small number of measures. CMS contracted with Massachusetts’ quality improvement organization to generate physician practice–level performance measures using Medicare administrative claims data only, resulting in 12 summary measures for each practice.12,13 One respondent called using GEM data “checking the box” to meet the AF4Q initiative public reporting requirements and felt it damaged the alliances’ credibility with local physicians who were working toward constructing medical records–based measures.
Contribution to Publicly Available Physician Performance Information
One of the AF4Q initiative’s goals is that alliance public reports would increase the amount of credible physician performance information available to consumers in alliance communities. The average number of highly credible reports—defined as reports using data from multiple payer or provider sources, produced by a neutral community-based organization, and available to the general public—increased in the 14 original AF4Q communities from 0.43 to 2.07, versus an increase from 0.43 to 0.57 in the 7 comparison communities. During this period, only 1 comparison community added a physician report sponsored by a community organization (Figure). The addition of these types of reports is significant, as the physician performance information they contain is available to all community residents, not just health plan enrollees.
Almost all reports included preventive care measures and measures of adherence to treatment guidelines for people with chronic illnesses, or biologic markers of chronic illness, irrespective of report sponsor. Preventive measures were specific to gender or age, while chronic illness measures were relevant for subgroups of community residents. Alliance reports in some communities expanded the amount of information about the treatment of chronic illnesses (eg, diabetes) by reporting measures based on medical records data that were not available in health plan reports. Also, by combining data from multiple sources, alliances were able to publish performance measures at the physician practice level, in contrast to measures constructed at a medical group level (Table). Also, alliances were able to report actual measure values, in contrast to many health plan reports that classified physicians into groups of high and low performers because there were not enough observations for each network physician or group practice to report actual measure values that were reliable. The addition of patient experience measures expanded the information in public reports beyond clinical measures or measures relevant only to certain subpopulations based on diseases or recommended preventive care guidelines. At baseline, measures of patient experience with physicians were available (in any report) for only 3 AF4Q communities and 1 comparison community. By 2011, consumers in 10 of 14 AF4Q communities had access to publicly reported patient experience measures (across reports from all sponsors) versus consumers in 2 of 7 comparison communities. This increase was due to the addition of patient experience measures in alliance reports.
Another perspective on the contribution of alliances to the availability of publicly reported physician quality information can be gained from a comparison of the reporting activity of alliances and chartered value exchanges (CVEs). Beginning in 2008, community organizations could apply to the federal government for CVE designation, which was awarded to multistakeholder coalitions. Among other requirements, the coalitions had to state their commitment to publish provider quality information. The CVEs were given access to summary Medicare provider performance data and received technical assistance through a peer-learning network; in contrast to the AF4Q initiative, however, they did not receive direct funding or a target reporting date.14 At present, 11 of the initial 14 AF4Q alliances are among the 24 organizations that have received CVE designation. Among CVEs that are not AF4Q alliances, only 3 report physician quality measures: California, Colorado (both in the comparison group; see Figure), and Virginia. Clearly, CVEs that are not AF4Q initiative participants are much less likely to provide physician performance information to community residents than AF4Q alliance CVEs.
The AF4Q initiative experience suggests public reporting of physician performance can be established successfully at the community level in a relatively short time by healthcare coalitions that receive financial support and technical assistance and are held to performance benchmarks. Communities that did not have such reports prior to the AF4Q initiative now have additional physician performance information available to local consumers. Community coalitions that were reporting physician performance prior to joining the AF4Q initiative extended the scope of their reports consistent with the AF4Q initiative guidance. Interview respondents credited participation in the AF4Q initiative, supported by other factors, as critical to their progress in reporting physician performance. It should be noted, however, that most AF4Q alliances were chosen in part because healthcare stakeholders in their communities had some experience in collaborative efforts; it is not clear whether communities with no such history could have achieved the same results.
The success of alliances in accomplishing their public reporting goals contrasts with the decidedly mixed results of some past attempts to influence local healthcare systems through voluntary collective actions.15 Among the many factors that likely contribute to this success, 3 seem especially important. First, prior reporting of physician performance by 3 of the 14 original AF4Q grantees proved that it could be done. Second, the likely winners and losers from physician performance reporting were not clear at the onset, reducing the potential for organized resistance to form early in the process. Third, most physicians had experience with health plan efforts to measure and report their performance. While this experience was not always satisfactory, it led many physicians in alliance communities to believe that performance reporting was inevitable. According to some respondents, participation in community-based reporting efforts was welcomed by physicians as an opportunity to shape the content of reports and influence decisions regarding the data used for performance measurement.
At present, and consistent with Young,8 interview respondents noted 2 potential obstacles to the expansion of community-based public reporting of physician performance. The first is lack of ongoing financial support. One alliance leader observed that, “If Robert Wood Johnson money went away, it would be extremely challenging for this community to pick up a half a million dollars a year to do public reporting.” Yet, this obstacle clearly has been surmounted in some communities where alliances have instituted dues for stakeholder participation, while other alliances have received funding from health plans or state governments.
A second potential obstacle is that a credible national public reporting effort could emerge, particularly in light of federal funding for the adoption of electronic medical records (EMRs) in physician practices. Properly configured EMRs can facilitate the collection of data from all patients irrespective of insurance status, and reporting of clinical measures can allay physician concerns about use of claims-based measures. EMRs could also create efficiencies in data collection and measure construction, lowering the costs for nationally produced reports. If a credible national effort emerges, community stakeholders might be less inclined to devote their time and resources to the development of locally produced physician performance reports.
Limitations of Analysis
The criteria employed by the RWJF to select community coalitions to participate in the AF4Q initiative suggest caution in extrapolating from the experience of AF4Q alliances. First, it is possible that the findings reflect primarily the selection of experienced and highly motivated community coalitions.10 However, not all community organizations chosen to be AF4Q alliances had experience with, or enthusiasm for, public reporting of physician performance. As noted, many were initially skeptical of the value of such reports relative to the required resource investment, and only 3 were reporting measurements prior to the AF4Q initiative. Also, some newly formed AF4Q alliances had limited histories of stakeholder collaboration.
Second, because several alliances have only recently begun to report physician performance, we have limited ability to conclude whether reporting has stimulated the long-term responses on the part of providers, purchasers, and consumers that the AF4Q initiative anticipates (see eAppendix A, AF4Q Initiative: Public Reporting Logic Model). However, a number of interview respondents cited specific quality improvement activities implemented by local physician groups that they believed were in response to alliance quality reports. A recently published study relating the experience of 1 alliance supports these observations.16 As part of the overall AF4Q initiative evaluation, data are being collected from longitudinal consumer and provider surveys, ongoing interviewing, and secondary sources that will be used to assess the long-term impact of public reporting initiatives in AF4Q communities.10