Dennis P. Scanlon, PhD; Jeffrey A. Alexander, PhD; Jeff Beich, PhD; Jon B. Christianson, PhD; Romana Hasnain-Wynia, PhD; Megan C. McHugh, PhD; Jessica N. Mittler, PhD; Yunfeng Shi, PhD; and Laura J. Bodenschatz, MSW
The Aligning Forces for Quality (AF4Q) initiative, funded by the Robert Wood Johnson Foundation (RWJF), is a multi-site, multi-year initiative with the overarching goals of improving the quality of healthcare and reducing health disparities in its 16 participant communities, and providing models for national reform.1
Launched in 2006, the AF4Q initiative comprises multiple interventions and goals, which are being developed and revised throughout the program’s 10-year course.2
In addition to being a complex and ambitious initiative in its own right, the AF4Q initiative is being implemented at a time of rapid change in the healthcare arena, when there is a growing awareness of the multiple determinants of health and healthcare quality, and significant national change in healthcare policy.3,4
Along with sponsoring the AF4Q initiative, arguably the largest healthcare improvement demonstration project on a community level in the United States to date, the RWJF also dedicates funding to support an independent impartial scientific evaluation of the program. This article provides a description of the research design, data, and limitations of the independent evaluation of the AF4Q initiative. The core purposes of the AF4Q evaluation are to contribute to basic knowledge in 5 main programmatic areas, and answer key questions about its effectiveness in each of these areas: (1) measurement and public reporting; (2) quality improvement; (3) consumer engagement; (4) equity/disparities reduction; and (5) payment reform. Additionally, the evaluation team will answer questions about the effect of aligning the individual programmatic areas. The evaluation team systematically collects evidence related to the facilitators, barriers, and successes of community-based health reform activities and presents it to policy makers, program funders, and communities that are undertaking or contemplating similarly complex initiatives. The AF4Q evaluation also contributes to the health services research literature.The AF4Q Initiative: A Complex and Emergent Program
In their seminal work on realistic program evaluation, Pawson and Tilley wrote, “Programs work (have successful ‘outcomes’) only in so far as they introduce the appropriate ideas and opportunities (‘mechanisms’) to groups in the appropriate social and cultural conditions (‘contexts’).”5
As shown in the evaluation team’s logic model (Figure
), the mechanisms of the AF4Q initiative are a combination of multiple, evolving, targeted interventions; funding; goals for the targeted areas; and various opportunities for technical assistance. Details of this primary logic model, as well as models for the specific program areas, are described in the article by Scanlon et al in this supplement.2
In general, as the Figure illustrates, the theory of change underlying the AF4Q initiative assumes that alliances (the generic term for the multi-stakeholder partnership in each AF4Q community) will coalesce around designing and implementing programmatic interventions. These interventions are hypothesized to influence intermediate outcomes on the way to improving targeted long-term outcomes. It is expected that it will take some time for the multi-stakeholder initiatives to develop and implement meaningful population-level interventions, and that the timelines will vary for alliances in different communities based on local context and prior experience. As the logic model illustrates, the alliances operate in an environment with many relevant external influences, including changes to state and federal healthcare policy. Complexity in the program is inherent since the RWJF’s goal was to establish a dynamic program that would evolve as the work and learning in the participant communities progressed.
The multi-stakeholder alliances charged with implementing the AF4Q interventions operate within a diverse set of communities with distinct social, demographic, and cultural characteristics. These contextual differences suggest that the program interventions may not be homogeneous across participants. The 10-year lifespan of the initiative and other factors related to timing add to the complexity of the evaluation. For example, the initial formation dates of the alliances vary considerably. Not only were communities added during different phases of the AF4Q initiative, some alliances were in existence long before its launch, whereas others were formed or expanded in response to the initiative. Additionally, implementation of the interventions may occur at a different pace in each of the communities for a variety of reasons, including whether any related work was under way in a particular area prior to receiving the AF4Q grant, the level of community engagement around each intervention, and the amount of time each alliance dedicated to achieving community agreement on individual decisions.The Evaluation Research Questions
The AF4Q logic model serves as the focal point from which the research questions, research design, data collection, and data analysis plans for the evaluation are derived. The evaluation team’s research goals were purposefully balanced to identify both intermediate and long-term effects of the AF4Q initiative, and to balance the study of progress on these outcomes with an understanding about how and why progress does or does not occur. Thus, the evaluation was designed to include both a summative component, focused on the degree to which expected outcomes are achieved as a result of the program, and a formative component, tasked with developing an ongoing understanding of how the AF4Q initiative unfolds and the expectation that the evaluation team would share information and lessons learned throughout the life of its study. The formative component of the AF4Q evaluation includes frequent and ongoing sharing of information with internal audiences including the RWJF, the AF4Q National Program Office, AF4Q technical assistance providers, and the AF4Q alliances. Formative observations are also shared with external audiences, in the form of research summaries and presentations, and are targeted toward policy makers, healthcare funders, or those in other communities interested in work to improve local healthcare systems.
The AF4Q evaluation research questions are broad and comprehensive. The summative questions focus on the long-term outcomes of the logic model, and the more granular questions focus formatively on the processes used to develop and implement interventions. Some of the research questions also focus on variations in outcomes and processes across alliances, and factors that explain such variations. While the full set of detailed research questions is too voluminous to include in this article, a sample of AF4Q evaluation research questions is included in the Table
. The Table organizes research questions according to the relevant component(s) of the logic model and indicates whether a research question is primarily summative or formative in nature.The AF4Q Evaluation Research Design
Because of the AF4Q initiative’s complex, changing, and voluntary nature, the evaluation team recognized from the outset that a standard experimental design was not appropriate or useful, and that a flexible, rather than fixed, research design was needed. Thus, the interdisciplinary AF4Q evaluation team employs a multiphase research design to study this complex multi-site program. A multiphase design, sometimes referred to as a methodological triangulated design, is one in which there are 2 or more subdesigns. Each subdesign, or “subproject,” has independent methodological integrity and complements the other(s) to attain the goals of the overall research design.6,7
Because of the underlying complexity of the program, there are multiple subprojects used in the AF4Q initiative to study the various programmatic areas and their alignment.
As an illustration of how a subproject relates to the overall design, the evaluation team recognized that much of the early work for grantee alliances involved establishing infrastructure, assembling stakeholders, and agreeing on vision and goals. Because of this, the evaluation team focuses not only on the study of the interventions and their effects, but also on the approaches to governance and organization employed in each grantee community. In accordance with its multiphase design, the evaluation team plans to link its findings about programmatic outcomes to the important local governance, organization, and contextual factors identified through its governance-focused work. By bringing these complementary pieces of the overall research design together, the evaluation team is able to assess factors related to the variation in the progress and effect of the program across AF4Q communities and the programmatic areas.
There are several key research challenges that influenced the choices of the team and were common to all or most portions of the evaluation of the AF4Q initiative. Those challenges are discussed below.Changing and Emergent Program
Despite the high-level clarity that the RWJF had about the types of interventions and people who needed to come together to design and implement them, at the start of the program, detail about how the particular interventions would be designed and implemented was left open for discussion. This approach was important to the RWJF, in that it allowed for in vivo learning and decision processes, and the flexibility to provide more definition in each area when the time was right programmatically. Thus, the evaluation research questions are reviewed periodically and revised or expanded, as needed, to accommodate the evolving program interventions and the rapidly changing local and national healthcare context.Developing Conceptual Frameworks
One of the main reasons why many of the AF4Q initiative’s programmatic areas were not precisely defined by the RWJF at the start of the program was that they were innovative and lacked a strong evidence base. For example, designing and measuring communitywide efforts targeted at engaging consumers to become more active in their health and healthcare is relatively uncharted territory.8
Because there was no existing model, the evaluation team had to develop a conceptual framework for how consumer engagement might work at the community level to systematically assess the depth and breadth of the work being conducted for consumer engagement. As for other programmatic areas, evaluators were first required to formulate programmatic area–specific logic models and make inferences from empirical data collection about the quality of the processes initiated by communities and the effect of these processes on intermediate and long-term outcomes.2
More detail on these programmatic area logic models can be found in the article by Scanlon et al in this supplement.2Challenges to Measuring Intensity and Scope
The evaluation team recognized from the outset that because of the size and the multi-faceted and changing nature of the program, it would be difficult to precisely measure the “dose” (or intensity) of each targeted AF4Q intervention, and the initiative at large. The evaluation team also recognized that it would be challenging to systematically measure the relative dose across AF4Q communities, especially since the program was designed by the RWJF to allow each community to have some amount of leeway in how it approached the implementation of each of the interventions. While the evaluation team employs various approaches to tackling this measurement issue, ranging from general and external (eg, the possibility that AF4Q communities characteristically have been more active in implementing healthcare interventions to date than non-AF4Q communities, and thus could be characterized as such via a binary variable) to more specific and internal to the program (eg, counting and comparing the range of community-level quality improvement activities across AF4Q communities), it is undoubtedly the case that the measurement contains some error, which may create challenges when attempting to link processes to specific program outcomes.Caution Is Needed When Attributing Observed Effects to the AF4Q Initiative
Because of contextual differences, temporal change, and the complex nature of the program, it is difficult to be definitive when attributing observed outcomes to the AF4Q initiative. To mitigate this concern, the evaluation team uses a variety of data collection and analysis approaches to assess, to the extent possible, the effect of the AF4Q in each programmatic area. When possible, this includes the specification of a control strategy. As discussed in the Appendix
, the control group for 2 of our 3 surveys includes a sample of respondents from non-AF4Q areas of the country. For other types of analyses, we attempt to select comparison communities based on population and demographics. Still, any control strategy is imperfect due to the nature of how program participants are selected.
This limitation is common in many program evaluations, and it is important to understand that there is some degree of uncertainty with statements regarding program effects. As reflected in our program logic model (Figure), an additional challenge to attribution is that the types of health improvement work taking place as part of the AF4Q initiative are also taking place, to some degree, in other non-AF4Q communities. For example, there is a national trend toward increased public reporting and transparency of quality measures. While the AF4Q initiative clearly provides resources and a specific structure for this work, these activities are not unique to AF4Q communities. Similarly, implementation of healthcare reform has resulted in many efforts that overlap with the goals of the AF4Q initiative. While evaluators need to be aware of, and account for, temporal trends in their evaluation designs, it is impossible to perfectly control for them.Measuring “Alignment”
A premise of the initiative is that the absence of synergy, or “alignment” in AF4Q program terms, among key stakeholders and across key programmatic areas (eg, consumer engagement, public reporting) has historically inhibited progress on healthcare quality improvement. From a research perspective, any type of synergy is challenging to define, observe, and measure, because, by definition, it is the interaction of elements that, when combined, produces a total effect that is greater than the contributions of the individual elements.
The evaluation team focuses attention on the measurement of alignment and linking alignment measure(s) to program outcomes. The evaluation team hypothesized, however, that much of the early AF4Q programmatic activity would be focused on individual silos (eg, public reporting, quality improvement, consumer engagement) rather than their alignment, and that this foundational and synergistic component of the AF4Q initiative would not be expected to materialize until later in the program. Because the building of stakeholder alignment is an element of governance and organization, the evaluation team focused early attention on that dimension of synergy. In addition, the evaluation team is employing multiple data collection and measurement strategies to assess the degree to which programmatic alignment is occurring in the overall initiative.Participant Selection and Generalizability
Another important consideration in study design decisions was that the grantee organizations were not randomly selected. The RWJF chose grantees based upon its own theories about which characteristics of the community and organization are appropriate for desired outcomes.9
Consistent with other healthcare programs or community health interventions, the voluntary nature of AF4Q participant community selection can threaten internal and external validity. In many cases, communities were already on their way to implementing key AF4Q-type interventions prior to joining the initiative. Absence of the counterfactual (ie, knowing what would have occurred without the AF4Q initiative) makes it difficult to isolate the true effect of the AF4Q initiative or, at least, the likely effect of randomly selected communities versus those selected competitively.
Acknowledging these threats, the evaluation team strives to clearly communicate the relevant caveats related to both internal and external generalizability when presenting its findings.Formative and Summative Findings
Because the AF4Q initiative was designed by the RWJF to serve as a demonstration for health improvement programs that addressed complex, real-world issues on a community level, the evaluation team committed from the outset to provide real-time (formative) feedback to the RWJF, its partners, and the participating communities throughout the course of the program. These formative products include presentations about high-level observations, charts and tables that outline grantee approaches or strategies in particular programmatic areas, detailed reports on specific topics or data sets, and results from case studies or from analyses in which 1 or more data sets are summarized and interpreted. To create a reasonable balance between providing formative (ie, throughout the program) and summative (ie, overarching, final) products, the evaluation team continually assesses emergent needs from the RWJF, the communities, and others.The Evaluation Data
A variety of different sources, including primary and secondary data, are collected and used to answer the research questions identified for the evaluation in the context of the multiphase design. Importantly, a research approach that combines qualitative and quantitative methods is essential for understanding the effects of the initiative and the processes that comprise the initiative. These sources were designed to be used on a stand-alone basis in some instances, but more often to be purposefully used in combination with other sources to provide contrast and depth to individual analyses, consistent with a methodological triangulated design. The AF4Q evaluation relies on data collected from 3 longitudinal surveys and multiple rounds of interviews with key AF4Q stakeholders, data derived from AF4Q program documentation, and existing observational data collected outside of the AF4Q evaluation. A description of each of the main data sources used in the AF4Q evaluation is available in the Appendix, which also includes details about the purpose and use of each data source, the target population and sampling strategy (where relevant), and other important information.Analytic Approaches Used in Evaluation
In this section, we describe how data are used to answer the research questions developed for our evaluation of the AF4Q initiative. Our analyses include a variety of single-method quantitative and qualitative approaches; quantitative dominant mixed-method approaches; and qualitative dominant mixed-method approaches. We briefly discuss the approaches in turn, highlighting salient issues.Quantitative Approaches
Our quantitative analyses generally rely on survey data collected specifically for the AF4Q initiative (ie, the AF4Q consumer, physician, and alliance surveys) and analysis of secondary data collected by others outside of the AF4Q initiative but relevant for answering specific AF4Q research questions (eg, Dartmouth Atlas of Health Care quality measures). For many of our quantitative analyses, we use a difference-in-difference approach to compare the change in outcomes within the AF4Q community or population of interest (eg, consumer attitudes and opinions) relative to the change for a comparable control group in non-AF4Q communities. The advantage of the difference-in-difference approach is that it washes away any important unobserved confounders that can be considered time invariant. For example, if a specific community has a fixed level of “social capital” that might be important for achieving success on important AF4Q outcomes, then, unlike results in cross-sectional analyses, which might be subject to bias due to important confounders, the results from difference-in-difference estimates are not threatened due to the longitudinal nature of the research design.
We are not always able to identify a comparable control group, however, and must use different analytic techniques to make inferences. For example, our AF4Q alliance survey asks participants in the multi-stakeholder partnership questions about the leadership, governance, and effect of the AF4Q work. Obviously, we do not observe similar data for non-participants since by definition the relevant questions would not apply to those not participating in an alliance. In this case, we use a longitudinal strategy to examine changes over time among survey respondents, both for the sample of alliance participants at large and the subset of the sample (ie, panel respondents) that provide survey responses at multiple time points. Other analyses undertaken as part of our quantitative research employ cross-sectional analyses to examine associations between key variables and specific outcomes. In addition, we employ descriptive statistics to characterize the distribution of certain variables and highlight the degree of variation within and across the AF4Q communities.Qualitative Approaches
The AF4Q logic model and many of the specific research questions developed by the evaluation team focus on understanding how alliances are organized and governed, how the alliances choose to design and implement the various AF4Q programmatic interventions, which factors inform their choices, and what challenges and opportunities alliance stakeholders associate with their participation in the AF4Q initiative. To gain an in-depth understanding of these topics, many of which are focused on processes rather than outcomes of the initiative, the evaluation team collects and analyzes qualitative data in the form of interviews with multiple types of key AF4Q stakeholders and AF4Q program documentation.
Although qualitative data do not lend themselves to generalization and are time-consuming to synthesize and analyze, they are vitally important to the work because they provide comprehensive and detailed information on important processes and meanings that underlie the program. The use of qualitative data also helps the evaluation team paint a more concrete and realistic picture of the evolving AF4Q initiative.
The evaluation team follows a systematic process for all types of key informant interviews that it conducts and identifies multiple respondents to discuss each topic it explores. By gathering data from multiple points of observation, the evaluation team is able to compare and contrast interviewee viewpoints to gain a dynamic view of issues and processes related to the AF4Q initiative. Interviews are digitally recorded and transcribed in full. The resultant interview transcripts are tagged or “coded” by evaluation team members using deductive high-level (global) categories corresponding to the AF4Q initiative’s main programmatic areas and major concepts that are relevant across all alliances (eg, alliance participation, resources, and structure). These global codes are then entered into a qualitative data analysis software package (Atlas.ti), which allows for large amounts of data in text form to be stored, sorted, and systematically queried. In consultation with the evaluation team’s qualitative data manager, the interview data are pulled from Atlas.ti and analyzed using inductive approaches by evaluation team members working to address specific research questions. Analysis processes conducted by the evaluation team using qualitative data are done using investigator triangulation to minimize bias and ensure systematic analysis of data.
The particular portion of interview data and the analysis processes used in the development of evaluation team products varies depending on several factors: (1) the topic(s) addressed by the research question; (2) the goal of each particular analysis (eg, description, explanation); (3) whether the question can be best addressed by using data from 1 or multiple alliances and/or multiple time points; and (4) the use of other types of data (eg, survey data, documentation) in the analysis and the designated role of each type of data.Mixed-Methods Approaches
Some of the questions that the evaluation team seeks to answer can be informed by a combination of the team’s quantitative and qualitative data. While not appropriate for every sub-design within the evaluation, the team has identified questions for which a quantitative or qualitative approach alone is not adequate, such as:
PDF is available on the last page.