Benchmarking Health-Related Quality-of-Life Data From a Clinical Setting
Janel Hanmer, MD, PhD; Rachel Hess, MD, MS; Sarah Sullivan, BS; Lan Yu, PhD; Winifred Teuteberg, MD; Jeffrey Teuteberg, MD; and Dio Kavalieratos, PhD
Healthcare in the United States is evaluated by a wide variety of organizations and measures, which often include conventional health indicators (eg, mortality rates, complication rates, health service use) and measures of patient satisfaction. These measures are important, but do not include one of the ultimate goals of healthcare: health-related quality of life (HRQoL).1 Conventional health indicators are frequently used instead of HRQoL measures because they are more easily quantified, and there are often comparative data available for interpretation of the results.2 As the field of HRQoL measurement has improved, there is increasing interest in including patient-reported outcomes (PROs), such as HRQoL, as an outcome of clinical care.3-5
HRQoL measures are either disease-specific or generic. Disease-specific measures provide a high level of detail about disease-specific symptoms and experiences; conversely, generic measures allow for comparison across disease groups.6 These generic HRQoL measures can broadly be categorized into 2 groups: health status measures and health preference measures. Health status measures (or profile measures) provide a description of multiple domains of health, such as physical functioning, mental health, and pain.7 These measures provide multiple scores—1 for each domain of health measured. In contrast, health preference measures can combine multiple domains into a single score, commonly referred to as a “utility score.” This score is constructed using preferences of different descriptions of health elicited from the general population.8
Generic HRQoL measures provide a unique opportunity to compare outcomes across clinical practices. Unfortunately, these measures are not collected and published routinely enough for comparisons across clinics, health systems, or regions. However, many of these generic measures have been included in large, nationally representative datasets, providing a rich resource for comparisons across smaller population and patient groups.6,9-11 Catalogs of age and sex normative values for generic HRQoL measures6,9,10 have also been published from these datasets, but the values from these catalogs do not address specific patient groups.
Comparing HRQoL results from a clinical sample to HRQoL outcomes for the general population is of limited value because we assume that a clinical sample has more health conditions and worse scores than the general population. Clinicians often worry about the interpretation of any outcome measure because of differing case mixes across clinicians and practices.12 Those who collect clinical HRQoL data would benefit from a point of comparison or benchmark13 for HRQoL results. Benchmarks exist for a wide variety of other clinical outcomes, ranging from physiological markers (eg, glycated hemoglobin and blood pressure control in patients with diabetes) to patient experience (eg, pain control measures, quietness of the hospital environment) to guideline adherence (eg, coronary artery disease, heart failure, atrial fibrillation).14,15
In this report, we construct a HRQoL benchmark for a clinical subspecialty using a nationally representative dataset called the Medical Expenditures Panel Survey (MEPS). MEPS includes HRQoL measures, which allow for the construction of overall population scores.9 The MEPS dataset also includes information about which subspecialists the respondents have seen in the past year, allowing for the construction of subspecialty-specific HRQoL scores. Cardiology is the clinical subspecialty illustrated in this report; however, the technique we present herein can be used with other subspecialties included in MEPS. We compared the national cardiology benchmark to data collected in cardiology clinics from a large health system.
METHODS Data: National Sample
MEPS is a nationally representative survey of healthcare utilization and expenditures for the US noninstitutionalized civilian population. It is a 2-year panel survey with an overlapping cohort design; each year, a new cohort is initiated and followed longitudinally. Cross-sectional analyses combine information from 2 MEPS cohorts. MEPS data, including the scores analyzed in this report, are freely available online through the Agency for Healthcare Research and Quality (AHRQ) website. We used the most recently released MEPS data for 2011.16
MEPS includes a self-administered questionnaire, which is distributed to all adults 18 years or older in eligible households participating in MEPS. The 2011 MEPS included version 2 of the 12-Item Short Form Health Survey (SF-12v2), a generic HRQoL measure that is widely used in clinical practice and research.17 We analyzed both the entire MEPS sample (referred to hereafter as “All MEPS”) and the subset of patients who reported seeing a cardiologist in 2011 (referred to hereafter as “Cardiology MEPS”).
Data: Clinical Sample
All patients presenting to any outpatient cardiology practice within our large academic and community health system were asked to complete a survey, including version 1 of the 36-Item Short Form Survey (SF-36v1), a generic HRQoL measure,18 through the Patient-Reported Information Clinical Intake System (PRIcis). PRIcis is an institutional initiative to electronically collect PROs using a secure patient portal or tablet computers in the clinic. These PROs are securely transmitted to and stored within our electronic health record (EHR) system for use in clinical care. Data used in this analysis were collected from March 2012 to April 2013. We included information from a patient’s first visit to a cardiology clinic within this time period; we excluded all other visits and all visits for outpatient heart catheterization. PRIcis was linked to the EHR for demographic and comorbidity information. These data are hereafter referred to as “Cardiology Local.”
The primary outcome of interest was the Short Form 6-Dimension questionnaire (SF-6D) health preference score.19 This score can be calculated from both the SF-12v2 in MEPS and the SF-36v1 from PRIcis. The questions in these measures were used to construct health scenarios that were evaluated using the standard gamble20 technique in a representative sample of the UK population. Regression analysis was then used to model the preferences assigned to each health state. With the resulting scoring algorithm, a preference-based score can be assigned to each health state with “dead” anchored at 0 and “full health” anchored at 1.0. This scoring algorithm is published in the peer-reviewed literature.19
Secondary outcomes of interest were the Mental Component Summary (MCS) score and Physical Component Summary (PCS) score. The MCS and PCS were developed from a reduction of the 8 dimensions in the SF-36 (physical functioning, physical role limitations, emotional role limitations, pain, general health, vitality, social functioning, and mental health) to 2 dimensions by factor analysis. The SF-12 is an abridged version of the SF-36, which was constructed so that MCS and PCS scores from either form would be equivalent.21,22
The MCS and PCS scores calculated from the SF-36v1 were based on 1990 US normative data using the publicly available scoring algorithms for version 1.23,24 Scores calculated from the SF-12v2 were based on 1998 US normative data held by QualityMetric, Inc,22 which is now part of Optum (Eden Prairie, Minnesota). The MCS and PCS scores were normalized so that both have averages of 50 and standard deviations of 10. We applied the standard correction to compare across the different versions,22 and we used imputed scores in our analyses—which are freely available in the MEPS dataset—calculated by applying a proprietary algorithm of, what was then, QualityMetric, Inc.
Both datasets included age in years, sex, and race. Race categories were collapsed into white, black, other, and unspecified to be consistent across the datasets. We calculated a Charlson Comorbidity Index (CCI) score constructed for administrative data,25 which uses International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes. We included ICD-9-CM codes entered in the EHR in the problem list or medical history before, or within 1 week, of the cardiology clinic visit. ICD-9-CM codes are included in MEPS if the individual had a condition linked to a 2011 event (eg, physician visit or taking medication) or disability day, or if the individual was, at that time, experiencing a condition as part of the MEPS condition enumeration assessment.
We used Welch’s 2-tailed t test with unequal variances to compare the mean for SF-6D, MCS, and PCS scores by age strata (18-44, 45-54, 55-64, 65-74, ≥75 years). MEPS results were weighted to be nationally representative in these comparisons. We used the unweighted number of respondents from MEPS for conservatism in the statistical testing.
As a sensitivity analysis to address different demographic compositions in the clinical and MEPS samples, we created a dataset that combined our clinical data with the MEPS cardiology subsample into a single dataset. We stratified by age group and used ordinary least squares regression with the outcome of interest (SF-6D, MCS, or PCS scores) on sex, number of comorbidities, and clinical versus national sample. These results were unweighted.
Statistical analyses were performed using SAS version 9.3 (SAS Institute, Cary, North Carolina). We considered P <.05 to be statistically significant. The clinically important difference, defined as “the smallest difference in score in the domain of interest which patients perceive as beneficial,”26 for the SF-6D is 0.04,27 and 5 for the MCS and PCS scores.
The University of Pittsburgh Institutional Review Board approved this study (#PRO 13060301).
RESULTS Response Rates
Within the 2011 MEPS sample, the response rate of individuals invited to complete the self-administered questionnaire was 93%. There were 21,959 respondents with SF-6D scores in All MEPS and 414 respondents with SF-6D scores in Cardiology MEPS. During the study period, 1945 patients were eligible to complete the PRIcis intake at their first visit. Of these, 1514 (78%) participated and 1434 (95%) completed the SF-6D. The response rate was consistent across age and sex groups, except for those 75 years or older, for whom the total response rate was 65%.
Demographics from the samples are presented in Table 1. The demographics of the Cardiology Local and the All MEPS samples are quite different; the Cardiology Local sample has a substantially lower proportion of females (42% vs 52%), lower proportion of racial minorities (8% minorities vs 18% minorities), and substantially higher mean CCI score (1.74 vs 0.62). The Cardiology MEPS subsample (43% female, 13% minorities; mean CCI score of 1.57) is a closer fit to the Cardiology Local sample with these demographic parameters than the All MEPS subsample. However, the distribution of specific comorbidities is still different between the Cardiology Local and Cardiology MEPS samples, with the Cardiology Local sample having more than twice the reported rate of congestive heart failure and cerebrovascular disease and half the rate of acute myocardial infarction.
Primary Outcome: SF-6D
When the Cardiology Local sample is compared with the All MEPS sample, there are statistically significant differences between mean SF-6D scores in the 65-to-74-years and 75-years-or-older age groups; however, these do not reach a clinically important difference of 0.0427 (Table 2). In contrast, when the Cardiology Local sample is compared with the Cardiology MEPS sample, there are statistically significant differences between mean SF-6D scores in 4 of the 5 age groups (45-54, 55-64, 65-74, ≥75). In 3 of these age groups (45-64, 55-64, and 65-75), the Cardiology Local sample has a clinically important improvement over Cardiology MEPS.
In the regression-based sensitivity analyses combining Cardiology Local and Cardiology MEPS data, SF-6D scores were statistically important and had a clinically important improvement in the Cardiology Local sample for the same 4 age groups (Table 3). For example, in those aged 55 to 64, Cardiology Local SF-6D scores were 0.059 higher than the Cardiology MEPS scores after adjusting for sex and number of comorbid conditions. As expected, as the number of comorbid conditions increased, the SF-6D scores decreased. SF-6D scores were also lower for females than males.
Secondary Outcomes: MCS and PCS
The MCS mean scores in the Cardiology Local sample are statistically, although not clinically, higher than the All MEPS sample in 4 age groups (18-44, 55-64, 65-74, ≥75) (Table 4). There are no statistically significant differences in the MCS values of the Cardiology Local sample compared with the Cardiology MEPS sample. In the sensitivity analyses combining Cardiology Local and Cardiology MEPS data, MCS scores are statistically different in 3 age strata (55-64, 65-74, ≥75), with the Cardiology Local sample reaching clinically important differences (of at least 5) in 2 of the age groups (55-64 and 65-74) (Table 5).
For PCS mean scores stratified by age, the Cardiology Local sample is statistically different from the All MEPS sample in 2 age groups (18-44, 45-54), and the Cardiology Local sample has clinically importantly lower scores than All MEPS in these groups (Table 4). There are 3 statistically different values when comparing the Cardiology Local sample with the Cardiology MEPS sample (age groups 45-54, 65-74, ≥75), and the differences reach clinical significance (improvement) in the 45-to-54 age group. In the sensitivity analyses, combining Cardiology Local and Cardiology MEPS data, PCS scores are statistically different in 4 age strata (45-54, 55-64, 65-74, ≥75), with the Cardiology Local sample reaching improved clinically important differences in these 4 age groups (Table 5).
This report illustrates the use of MEPS to generate nationally representative HRQoL benchmark scores for patients receiving care from a cardiologist. We used these benchmark scores to add an important dimension of understanding to the routine HRQoL data collected in cardiology clinics of a large healthcare system. Compared with the national benchmarks, the local cardiology clinics had both statistically significant and clinically important improvements in health preference scores for those aged 45 to 74. Our system did not have any scores that were statistically or clinically lower than the national benchmarks.
To better understand why overall health preference scores for local cardiology clinics were different than the national benchmarks, we also calculated MCS and PCS scores from the PROs. These analyses showed fewer differences than the overall health preference score. Compared with the national benchmarks, the improvements in the local cardiology clinics were generally larger for PCS than MCS scores. These findings suggest that there is greater room for improvement in mental health outcomes than physical health outcomes; focusing quality improvement on mental health outcomes within our cardiology clinics may have a great impact on patient HRQoL. Findings such as these illustrate the power of using generic health measurement, even in subspecialty care.
This study also shows the limitations of comparing normative values in clinic populations to the general population. There are several generic HRQoL measures that have been included in nationally representative datasets, such as MEPS, the Joint Canada/United States Survey of Health, the National Health Measurement Study, and the US Valuation of the EuroQol 5-Dimension (EQ-5D) Health States Survey28; however, these datasets are primarily filled with healthy respondents. When the Cardiology Local sample was compared with the All MEPS sample in this study, there were no clinically important differences in health preference scores or MCS scores. The Cardiology Local sample had clinically importantly worse PCS scores in 2 of the age groups (18-44 and 45-54). Compared with the individuals in MEPS who reported seeing a cardiologist, the Cardiology Local sample was the same as or better than, the Cardiology MEPS benchmarks.
MEPS is a rich dataset with information about many medical expenditures, including subspecialty visits, and it also includes generic health preference measures, such as the EQ-5D from 2000 to 2003, the SF-12v1 from 2000 to 2002, and the SF-12v2 from 2003 to present. The analysis shown in this report could be used to create similar benchmarks for other clinic or patient populations for which the MEPS sampling would reflect the local sample of interest. For example, it would be possible to select respondents in MEPS who have been to an endocrinology clinic or who report expenditures for thyroid disorders. Selecting rare conditions would reduce the sample size, although analysts could consider combining multiple years of MEPS data for more respondents. MEPS is a representative sample of the US civilian noninstitutionalized population, explicitly excluding institutionalized individuals in prisons or nursing homes and US citizens in the military. We expected these exclusions to bias our results in the opposite direction that was found, as the clinics in our healthcare system serve individuals in nursing homes.
Comparing local cardiology clinic samples to MEPS is limited by differing methods of eliciting the comorbid conditions appearing in each dataset. Comorbid conditions in the local dataset were included in the problem list or medical history in the EHR; conditions are physician-coded and do not have to be active at the time. Comorbid conditions in MEPS are reported by the respondent for the Household Component field and then recorded by the interviewer as verbatim text. This text was then coded by professional coders to fully specified ICD-9-CM codes, including medical condition and V codes. Although codes are verified, and error-rate percentages for any coder are low, the Agency for Healthcare Research and Quality recommends against assuming that household respondents have high accuracy in reporting condition data.29 More work is necessary to determine the comparability of these methods to elicit comorbid conditions.
Usual techniques for case-mix adjustment include age, sex, and comorbidity information.30 Our primary analyses compared age strata between the Cardiology Local sample and the Cardiology MEPS subsample because the number of comorbid conditions and the proportion of females were similar in each data source. Given the small strata sample sizes, we did not further stratify by sex. Given the small number of nonwhite respondents, we were unable to look for differences by race, although there are well-known differences in HRQoL reports by race in the general population.31
The stratified analysis allowed us to incorporate MEPS weights so the estimates are representative of the US noninstitutionalized civilian population. For a sensitivity analysis, we combined the Cardiology Local and Cardiology MEPS samples, which adjusted for sex and number of comorbid conditions within the age strata. These analyses gave similar results to the stratified analyses but did not allow the use of MEPS weights. Techniques should be developed to include case adjustment and weighting when comparing clinic samples to national normative values.
Our work demonstrates that collection and interpretation of PROs in a clinical setting is feasible. Furthermore, incorporating PROs provides information that complements traditional clinical measures by emphasizing patient-centered care. We illustrate a technique to create benchmarks to allow interpretation of patient-reported outcomes in a clinic setting. Our findings suggest the clinical practices in our sample are achieving or exceeding benchmarks created from nationally representative data. They also provide clues about areas of improvement that could be high yield within these practices. If used more broadly, PROs could add an important dimension to our understanding of high-value care.