Through literature review and collaborative design, we propose the Focus, Activity, Statistic, Scale type, and Reference (FASStR) framework to provide a systematic approach to health care operation metric definition and use.
ABSTRACTObjectives: Poorly defined measurement impairs interinstitutional comparison, interpretation of results, and process improvement in health care operations. We sought to develop a unifying framework that could be used by administrators, practitioners, and investigators to help define and document operational performance measures that are comparable and reproducible.
Study Design: Retrospective analysis.
Methods: Health care operations and clinical investigators used an iterative process consisting of (1) literature review, (2) expert assessment and collaborative design, and (3) end-user feedback. We sampled the literature from the medical, health systems research, and health care operations (business and engineering) disciplines to assemble a representative sample of studies in which outpatient health care performance metrics were used to describe the primary or secondary outcome of the research.
Results: We identified 2 primary deficiencies in outpatient performance metric definitions: incompletion and inconsistency. From our review of performance metrics, we propose the FASStR framework for the Focus, Activity, Statistic, Scale type, and Reference dimensions of a performance metric. The FASStR framework is a method by which performance metrics can be developed and examined from a multidimensional perspective to evaluate their comprehensiveness and clarity. The framework was tested and revised in an iterative process with both practitioners and investigators.
Conclusions: The FASStR framework can guide the design, development, and implementation of operational metrics in outpatient health care settings. Further, this framework can assist investigators in the evaluation of the metrics that they are using. Overall, the FASStR framework can result in clearer, more consistent use and evaluation of outpatient performance metrics.
Am J Manag Care. 2020;26(6):e172-e178
When measurements are dissimilar or inadequately defined, comparison of operational performance and interpretation of research is impaired. We propose the Focus, Activity, Statistic, Scale type, and Reference (FASStR) framework. The FASStR framework can be used to provide an objective, systematic approach to ensure that metrics are defined and documented in a clear, consistent manner.
Performance measurement is fundamental to health care, from clinical operations to research. The Agency for Healthcare Research and Quality recognizes this in its mission statement, which is to “develop and spread evidence and tools to measure and enhance the efficiency of health systems—the capacity to produce better quality and outcomes.”1
Measurement of operational performance in health care is ubiquitous. We measure patient waiting, clinic durations, staff overtime, costs, patient satisfaction, no-shows, and numerous other metrics. Measures of operational performance provide a basis for evaluating and comparing the performance of different health care institutions, operating practices, and policies. Although they are seemingly easy to create, developing usable and comprehensible metrics is quite challenging. Inadequately designed and documented operational metrics create ambiguity and confusion and can lead to incorrect managerial decisions that are potentially at odds with the objectives of high-quality and safe patient care.2
To be maximally useful, a metric needs several important characteristics. First, it should be reliable and valid, measuring its true target consistently across time and contexts.3 Second, it needs to be consistent with the goals, standards, and practices of the organization within which it is applied. Third, a metric must be clear and unambiguous in its definition and use. Fourth, it should be as generalizable as possible without reducing its utility or specificity. Finally, it must be relevant to practice and aid in managerial decision-making; metrics that do not directly contribute to the management of the organization or are not sufficiently sensitive to detect meaningful operational changes are, at best, potential distractions from more critical information.
Inadequately defined and documented metrics can degrade the consistency of metric application from one instance to the next (eg, facility, time period). This is the essence of test-retest reliability. A lack of reliability undermines the sharing and acceptance of critical information, slowing the spread of vital data for efforts such as interventions and improvement collaboratives. Without clear and thorough definitions, we are unable to adequately assess the relevance or utility of a particular metric to any specific environment or situation, thus diminishing the metric’s generalizability. For example, if hospital A defines patient waiting differently than hospital B does, but both assume identical meanings, they can (wrongly) arrive at quite different conclusions about the efficacy of a particular policy. This can manifest even within an organization because of definitional “drift” as employees turn over, rendering historical comparisons meaningless. Finally, organizations may not understand or be able to communicate what is being measured and/or why it changed, thereby potentially reducing buy-in from staff and stakeholders.
Some health care organizations have begun to standardize measurement of key performance metrics.4 The National Quality Forum (NQF) has established a system of metric evaluation and stewardship.5 However, even widely adopted metrics may use a variety of definitions and may then be collected and reported along a spectrum of interpretations. Inadequacies in 1 or more of the characteristics listed above, such as consistency or generalizability, can undermine even well-supported national efforts at improving and standardizing operational metrics in health care. NQF’s Measure Evaluation Criteria evaluate the suitability of measures on importance, scientific acceptability of measure properties, feasibility, usability, and related/competing measures. We sought to develop a unifying framework focused on the second property (acceptability of measure properties) to improve the development, deployment, and spread of consistent, well-defined metrics and accelerate translation across organizations and research. Adoption and use of such a framework by investigators and organizations could make study results more comparable, repeatable, and, most importantly, more applicable to practice. This creates a “virtuous cycle,” enabling faster and more complete operational improvement.
There is a need for a framework that helps investigators and practitioners define their metrics of interest more clearly. To develop such a measurement framework, we used an iterative process consisting of (1) examination of the literature and (2) expert assessment and collaborative design. We brought together a team of international experts in health care operations and clinical research with MD and/or PhD degrees in operations management and decades of experience in interdisciplinary research to examine a metrics database and collaboratively design an organizing framework.
First, we sampled the literature from the medical, health systems research, and health care operations (business and engineering) disciplines to assemble a representative sample of papers in which performance metrics were the focus of the research. We narrowed our search to outpatient clinical operations because, with nearly 1 billion ambulatory care visits in 2015, this setting serves the largest proportion of patients in the United States.6
The number of studies involving outpatient operations has grown substantially over the past 3 decades (Figure 1). In the interest of parsimony, we further focused on patient flow as our illustrative example of an operational context because of both its ubiquity and importance across a wide range of clinical settings.
Our initial sampling of the patient-flow literature identified 268 papers, of which 126 met our inclusion criteria (published during the inspection period; at least 1 operational metric was mentioned and quantified in some way). For each article, we identified and documented the operational performance metrics and their definitions (if provided by the authors). We found more than 200 different outpatient performance metrics discussed in these 126 papers. We then organized the metrics into an online database of quotes, metric definitions, and other relevant data.
We observed many opportunities for improvement in the literature regarding how thoroughly and clearly performance metrics were defined and documented, with nearly all papers exhibiting at least 1 deficiency. We organized these deficiencies into 2 qualitative clusters: inconsistent and incomplete definitions. We reviewed and processed more than 200 operational metrics, such as clinic overtime (ie, the time that clinic operates beyond a scheduled closing time), provider idle time, and utilization. In this paper, we use the patient waiting time metric to provide illustrative examples.
Inconsistent definitions. We found at least 10 different definitions for patient waiting time. Definitions included “time patient is waiting for the doctor,”7 “number of time slots between arrival and service,”8 and “waits and delays.”9-11 The most common definition was “mean time between patient’s arrival at the clinic and when first seen by the physician.”12-20 Some studies considered waiting in the waiting room, whereas others considered waiting in the exam room.21 However, subtle variations existed; for example, consider the definition of “arrival.” Does it mean the time that a patient entered the clinic, was registered, or self-registered in a kiosk? Discrepancies in definitions of seemingly identical metrics diminish the generalizability of these metrics and their utility in practice.
Incomplete definitions. Many metric definitions lacked essential details necessary for use. Some patient waiting metric definitions failed to define the period of time being measured. For instance, consider the patient waiting time definition of “waits and delays”9; the reader has no way of knowing when a wait or a delay starts or ends in that study. Surprisingly, 27 of 68 papers (39.7%) we reviewed that claimed to measure patient waiting time did not provide any definition of the metric, suggesting that incomplete definitions are common. Table 110,12-14,22-28 shows sample definitional deficits.
To illustrate how the results of a study may be affected, consider the following. Envision a hypothetical, single-physician clinic in which the manager uses an algorithm that simultaneously identifies the best patient schedule to minimize both patient waiting time and physician idle time (ie, time that the physician waits between patients). Several of these algorithms have been developed.7,29-31 The algorithm recommends a variable-interval schedule, meaning that appointment slots could be any duration—the slots may not match how long it takes the providers to deliver the care. How the components of patient waiting and physician idle time are defined will influence what happens in practice. For instance, if total waiting for all patients during a clinic session is used, the metric is weighted more toward the patients and the resulting schedule will have appointment times spread out more, causing less waiting for patients but more idle time for the physician. Alternatively, if average waiting time per patient is used, the resulting schedule will set appointment times closer together (ie, shorter appointment slots), thus favoring the physician with less idle time and resulting in longer waits for the patients. As seen in Figure 2, the first schedule could result if total patient waiting time is used and the second if average patient waiting time is used. Note that neither metric definition is inherently better than the other; the point is that if metrics are not carefully and thoughtfully defined, this ambiguity could cause unintended consequences, reduce the ability to compare the results with those of other analyses and generalize to other clinics, and reduce optimal management decision-making.
The FASStR Framework
The framework development process involved several iterative rounds, during which experts examined the literature data set and proposed and discussed additions or changes to the nascent framework. After each round of revision, pairs of members of the investigator team, including both investigators and practitioners, tested the draft framework by applying it to metrics in an attempt to identify cases in which the framework lacked organizational value or insight. In each round, the framework was revised based on this feedback. The team communicated online and held face-to-face meetings at national conferences over a period of 26 months. The process consisted of the following:
The resulting framework draft was presented at 3 academic conferences to solicit feedback from both investigators and practitioners and was further refined after each presentation. Feedback from conference attendees was iteratively assessed against the framework and reviewed and discussed by the research team until consensus was reached.
We refined the organization of our findings into 5 thematic domains that gave the framework its name: Focus, Activity, Statistic, Scale type, and Reference (FASStR). Described here, these dimensions involve the subject of what is being measured, the activity being measured, the calculation and units of measurement, and the comparator to which the measurement is related.
Focus. Included in the majority of papers we reviewed, the Focus of a metric is its subject: the person, entity, or object of interest of the metric. Also called the unit of analysis, the Focus could be a patient, a provider, an exam room, a clinic, an operating room, a division, an entire hospital, or a piece of equipment.
Activity. Activity involves what the metric’s Focus is doing. In other words, what is the action, event, or status that the metric is measuring? If the Focus is the noun of a metric, Activity is the verb. As examples, patients (Focus) could be waiting (Activity); an operating room (Focus) could be occupied with a procedure (Activity) or being cleaned (Activity). The Activity’s definition should be specific enough so that there is no confusion or room for misinterpretation as to when the Activity starts and when it concludes.
Statistic. Statistic is how the metric is arithmetically calculated. Common statistics include pure counts (ie, sums), central tendency (mean, median), percentiles, variation (SD), the minimum or maximum, and proportions. For metrics expressed as ratios, such as minutes per clinic session or interactions per patient, the denominator (following the “per”) should be clearly defined. The time frame in which data are collected (eg, over an hour, clinic session) should be specified as part of its Statistic. For example, a provider’s idle time during a clinic session is not the same Statistic as a provider’s idle time per patient during a clinic session.
Scale type. Scale type represents the units or amounts in which the metric is expressed or, in some cases, the format of the measurement instrument. The Scale type is often inextricably tied to the metric’s Focus or Activity. Using “average number of patients in the waiting room” as an example, the Focus of the metric is patients and the Scale type (its units) is also patients. The Scale type can often be straightforward to determine for time-based scales (eg, clinic duration), where minutes or hours are typically used. Other common Scale types include Likert scales (eg, satisfaction) and categories (eg, yes/no). Some metrics may have multiple units combined in their Scale type definition, such as when the Statistic is a proportion like “patients per hour.” In that example, both “patients” and “hours” are essential elements of the Scale type for the metric.
Reference. This refers to the predefined reference to which the value is being compared (if any). Metrics are typically used to guide decision-making, so they benefit from having a reference (eg, benchmark, industry standard). Even if no objective standard for comparison exists, a previous period’s value is commonly used to assess change. The Reference should have a directional implication that is easy to understand whether higher or lower values are desirable.
During our literature review, we found multiple examples of metrics for which a lack of clarity existed in all 5 framework dimensions. Thus, it is important to consider all 5 when defining a metric. However, sometimes 1 of the dimensions may not be relevant to the metric’s intended use. For example, there could be an instance where something (ie, the Focus) is being counted but not engaging in a specific Activity (eg, number of unread radiology exams; “being unread” could conceivably be the Activity). However, in general, most metrics would benefit greatly from being documented in a way that attempts to engage all 5 dimensions of FASStR, and every metric’s development effort should consider all 5 dimensions to determine whether they are potentially relevant.
The examples shown under each of FASStR’s 5 dimensions in Figure 3 were largely derived from the literature sample we reviewed. These examples offer guidance in the use of each dimension but may not be comprehensive. If a metric’s definition requires expanding these lists, investigators and practitioners should do so and then clearly document its alignment with the FASStR framework.
Table 2 offers illustrative examples of how FASStR can be applied to common metrics to clarify their operationalization and intent. Although Table 2 is illustrative, an organization might prefer to define a metric that sounds similar to one of these in a markedly different way. In many cases, an organization will need to add significantly more detail, such as when an organization-specific resource, metric, or the like is a necessary element of 1 or more of the FASStR dimensions.
Through an iterative process we propose FASStR, a new framework to help guide those who are designing, developing, and implementing operational metrics in health care settings. The framework’s 5 dimensions attempt to cover all aspects of a metric’s definitional requirements. Developed by an international, multidisciplinary team of subject-matter experts from academia and practice and based on a thorough literature review, the FASStR framework provides a method by which every metric can be examined to ensure that it is thoroughly developed and completely documented. Editors and reviewers could use this framework to evaluate existing and future metrics in submitted papers similar to how research guidelines are used to enhance the quality of reports resulting for medical research.32
In addition, practitioners and those who manage health care delivery systems should consider using FASStR in their organizations to help ensure that metrics are defined and documented in a clear, consistent manner. This could improve employee comprehension and help to retain organizational memory in the face of staff turnover. We propose that studies of clinical operations in the medical literature could use the FASStR framework to ensure representation of the relevant dimensions. This would be similar to a checklist for observational cohort clinical research studies.33 Both research and practice benefit from better metric definitions when large-scale improvement efforts take place. Collaboratives, for example, can be powerful, but if metrics are interpreted and implemented differently across the various institutions participating in the effort, the prospects for undesirable delays and diminished outcomes are increased. If those who design and define metrics consider all 5 dimensions for each metric, their work will be less ambiguous, more applicable, more generalizable, more reproducible, and, ultimately, more valuable.
Like any framework, FASStR will improve as it gets more widely used and applied in more and more diverse contexts. Through this improvement, ambiguities can be resolved and best practices will emerge. One clear potential benefit is for the development of a “standard menu” of well-defined metrics appropriate for various health care delivery settings (eg, outpatient clinics, inpatient wards, operating rooms, emergency departments). As health care organizations gravitate toward using a standard set of metrics defined and implemented in the same way, the opportunity for meaningful comparison should increase and improvement accelerate. FASStR fills an important standardization void that has so far vexed the health care industry and limited many of its well-intentioned improvement efforts, but only through adoption in both the application and research domains can its potential benefits be fully realized.
Health care operational metrics are plagued by inconsistency and incompleteness. Through an iterative process of literature review, multidisciplinary expert assessment, and end-user feedback, we propose the FASStR framework to address these deficiencies in operational metrics. The FASStR framework fills a gap in standardization necessary to enhance process improvement and research.
The authors would like to acknowledge Linda LaGanga, PhD, and Stephen Lawrence, PhD, for their expert input and valuable feedback.Author Affiliations: Department of Computer Information Systems and Business Analytics, College of Business, James Madison University (ET), Harrisonburg, VA; Faculty of Business, Özyeğin University (TC), Istanbul, Turkey; Department of Operations, Business Analytics, and Information Systems, Carl H. Lindner College of Business, University of Cincinnati (CMF, MM, DLW), Cincinnati, OH; Department of Emergency Medicine, College of Medicine, University of Cincinnati (CMF), Cincinnati, OH; Department of Finance, Operations & Information Systems, Goodman School of Business, Brock University (KJK), St Catharines, Ontario, Canada; James M. Anderson Center for Health Systems Excellence, Cincinnati Children’s Hospital Medical Center (DLW), Cincinnati, OH; Department of Emergency Medicine, Vanderbilt University Medical Center (MJW), Nashville, TN.
Source of Funding: This research was partially funded by the College of Business at James Madison University. Dr Ward was funded by an award from the National Heart, Lung, and Blood Institute (K23HL127130).
Author Disclosures: Dr Ward has grants pending from the Department of Veterans Affairs and has received grants from the National Institutes of Health and the Department of Veterans Affairs (no conflicts anticipated). The remaining authors report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.
Authorship Information: Concept and design (ET, CMF, KJK, MM, DLW); acquisition of data (TC, CMF, DLW); analysis and interpretation of data (ET, TC, CMF, KJK, DLW); drafting of the manuscript (ET, CMF, KJK, DLW, MJW); critical revision of the manuscript for important intellectual content (ET, TC, CMF, KJK, MM, DLW, MJW); provision of patients or study materials (ET); obtaining funding (MJW); administrative, technical, or logistic support (CMF, MM, MJW); and supervision (CMF, MJW).
Address Correspondence to: Michael J. Ward, MD, PhD, MBA, Department of Emergency Medicine, Vanderbilt University Medical Center, 1313 21st Ave S, 703 Oxford House, Nashville, TN 37232. Email: firstname.lastname@example.org.REFERENCES
1. AHRQ announces interest in research on healthcare delivery system affordability, efficiency, and quality. Agency for Healthcare Research and Quality. December 18, 2013. Accessed April 30, 2020. https://grants.nih.gov/grants/guide/notice-files/NOT-HS-14-005.html
2. IHI Triple Aim Initiative. Institute for Healthcare Improvement. Accessed April 30, 2020. http://www.ihi.org/Engage/Initiatives/TripleAim/Pages/default.aspx
3. Pedhazur EJ, Schmelkin LP. Measurement, Design, and Analysis: An Integrated Approach. Psychology Press; 1991.
4. AAFP outlines quality measurement strategy for primary care. American Academy of Family Physicians. January 14, 2019. Accessed May 30, 2020. https://www.aafp.org/news/practice-professional-issues/20190114measurespaper.html
5. Measure evaluation criteria. National Quality Forum. Accessed April 30, 2020. http://www.qualityforum.org/Show_Content.aspx?id=83157
6. Ambulatory care use and physician office visits. CDC. January 19, 2017. Accessed April 30, 2020. https://www.cdc.gov/nchs/fastats/physician-visits.htm
7. Robinson LW, Chen RR. Scheduling doctors’ appointments: optimal and empirically-based heuristic policies. IIE Trans. 2003;35(3):295-307. doi:10.1080/07408170304367
8. Chakraborty S, Muthuraman K, Lawley M. Sequential clinical scheduling with patient no-shows and general service time distributions. IIE Trans. 2010;42(5):354-366. doi:10.1080/07408170903396459
9. Haraden C, Resar R. Patient flow in hospitals: understanding and controlling it better. Front Health Serv Manage. 2004;20(4):3-15.
10. LaGanga LR, Lawrence SR. Clinic overbooking to improve patient access and increase provider productivity. Decis Sci. 2007;38(2):251-276. doi:10.1111/j.1540-5915.2007.00158.x
11. LaGanga LR, Lawrence SR. Appointment scheduling with overbooking to mitigate productivity loss from no-shows. Paper presented at: Decision Sciences Institute Annual Conference; November 17, 2007; Phoenix, AZ.
12. Wijewickrama A, Takakuwa S. Simulation analysis of appointment scheduling in an outpatient department of internal medicine. In: Proceedings of the Winter Simulation Conference. IEEE; 2005:2264-2273. doi:10.1109/WSC.2005.1574515
13. Wijewickrama AKA. Simulation analysis for reducing queues in mixed-patients’ outpatient department. Int J Simul Model. 2006;5(2):56-68. doi:10.2507/IJSIMM05(2)2.055
14. Dexter F. Design of appointment systems for preanesthesia evaluation clinics to minimize patient waiting times: a review of computer simulation and patient survey studies. Anesth Analg. 1999;89(4):925-931. doi:10.1097/00000539-199910000-00020
15. Benson R, Harp N. Using systems thinking to extend continuous quality improvement. Qual Lett Healthc Lead. 1994;6(6):17-24.
16. Babes M, Sarma GV. Out-patient queues at the Ibn-Rochd health centre. J Oper Res Soc. 1991;42(10):845-855. doi:10.2307/2583412
17. Benussi G, Matthews LH, Daris F, Crevatin E, Nedoclan G. Improving patient flow in ambulatory care through computerized evaluation techniques. Rev Epidemiol Sante Publique. 1990;38(3):221-226.
18. Vissers J, Wijngaard J. The outpatient appointment system: design of a simulation study. Eur J Oper Res. 1979;3(6):459-463. doi:10.1016/0377-2217(79)90245-5
19. Vissers J. Selecting a suitable appointment system in an outpatient setting. Med Care. 1979;17(12):1207-1220. doi:10.1097/00005650-197912000-00004
20. Blanco White MJ, Pike MC. Appointment systems in out-patients’ clinics and the effect of patients’ unpunctuality. Med Care. 1964;2(3):133-145.
21. White DL, Froehle CM, Klassen KJ. The effect of integrated scheduling and capacity policies on clinical efficiency. Prod Oper Manag. 2011;20(3):442-455. doi:10.1111/j.1937-5956.2011.01220.x
22. Chan K, Li W, Medlam G, et al. Investigating patient wait times for daily outpatient radiotherapy appointments (a single-centre study). J Med Imaging Radiat Sci. 2010;41(3):145-151. doi:10.1016/j.jmir.2010.06.001
23. Partridge JW. Consultation time, workload, and problems for audit in outpatient clinics. Arch Dis Child. 1992;67(2):206-210. doi:10.1136/adc.67.2.206
24. Klassen KJ, Yoogalingam R. Strategies for appointment policy design with patient unpunctuality. Decis Sci. 2014;45(5):881-911. doi:10.1111/deci.12091
25. Cayirli T, Veral E, Rosen H. Assessment of patient classification in appointment system design. Prod Oper Manag. 2008;17(3):338-353. doi:10.3401/poms.1080.0031
26. Belter D, Halsey J, Severtson H, et al. Evaluation of outpatient oncology services using lean methodology. Oncol Nurs Forum. 2012;39(2):136-140. doi:10.1188/12.ONF.136-140
27. Lenin RB, Lowery CL, Hitt WC, Manning NA, Lowery P, Eswaran H. Optimizing appointment template and number of staff of an OB/GYN clinic—micro and macro simulation analyses. BMC Health Serv Res. 2015;15(1):387. doi:10.1186/s12913-015-1007-9
28. Borgman NJ, Vliegen IMH, Boucherie RJ, Hans EW. Appointment scheduling with unscheduled arrivals and reprioritization. Flex Serv Manuf J. 2018;30(1-2):30-53. doi:10.1007/s10696-016-9268-0
29. Kaandorp GC, Koole G. Optimal outpatient appointment scheduling. Health Care Manag Sci. 2007;10(3):217-229. doi:10.1007/s10729-007-9015-x
30. Klassen KJ, Yoogalingam R. Improving performance in outpatient appointment services with a simulation optimization approach. Prod Oper Manag. 2009;18(4):447-458. doi:10.1111/j.1937-5956.2009.01021.x
31. Denton B, Gupta D. A sequential bounding approach for optimal appointment scheduling. IIE Trans. 2003;35(11):1003-1016. doi:10.1080/07408170304395
32. Johansen M, Thomsen SF. Guidelines for reporting medical research: a critical appraisal. Int Sch Res Notices. Published online March 22, 2016. doi:10.1155/2016/1346026
33. STROBE checklists, version 4. STROBE Statement. October/November 2007. Accessed December 19, 2019. https://www.strobe-statement.org/index.php?id=available-checklists