CLICK HERE TO VIEW THE CORRECTED VERSION OF THIS ARTICLE
Background: Generic health status measures are commonly used in the evaluation of rheumatoid arthritis (RA) patients. The reliability, validity, and sensitivity of the instruments in the assessment of quality of life (QOL) in RA, and how they correlate to other clinical measurements, have long been questioned.
Objective: Analyze the performance of a commonly used generic health status measure, the Medical Outcomes Study 36-Item Short Form (SF-36), against the Outcome Measures in Rheumatology (OMERACT) criteria.
Methods: Data were analyzed from 7 double-blind, randomized controlled trials that examined the effectiveness of 1 or more interventions in RA. The primary outcome measures evaluated were the Mental and Physical Component Scores of the SF-36. Comparators were 1 or more of the following: the Health Assessment Questionnaire scores, tender joint count (TJC), the Disease Activity Score, and the American College of Rheumatology Responder Index (ACR20, ACR50, ACR70). The ability to detect a treatment effect in the study outcomes was evaluated using 3 measures: treatment difference, standardized response mean, and relative efficiency in relation to the TJC.
Results: As a generic QOL measure, the SF-36 is better suited to capture the holistic health of the patient, as reflected in the World Health Organization definition of health as being not only the avoidance of disease but the physical, emotional, and social well-being of the patient. Furthermore, use of the SF-36 permits comparisons of physical and mental aspects of QOL in the RA patient population, as well as comparisons of QOL parameters between patients with RA, other patient groups, and the general population.
Conclusion: The SF-36 deserves serious consideration for inclusion in the core set of outcomes in RA trials.
(Am J Manag Care. 2007;13:S224-S236)
Patient quality of life (QOL) outcome-based studies are designed to evaluate whether patient health has improved as measured by physical, mental, and social instruments.1 In industrialized countries, only one third of the burden from disease is due to mortality, with two thirds due to physical, mental, and social disability.2 Although the inverse is true in low- and middleincome countries, a third of the burden of disease in these domains is still due to the impact on well-being. Thus, we require appropriate outcome measures for those medical interventions that are designed to improve well-being in addition to, or instead of, extending the duration of the patient’s life. This is certainly true for rheumatoid arthritis (RA) trials.
Of the many measurement tools available to clinical researchers, those that measure patient well-being are perhaps the most important for evaluating patient-perceived outcomes. Because well-being is a complex concept or attribute, its definition has been the subject of great debate.3 It is variably interpreted as health-related QOL (HRQOL) or function. Fitzpatrick et al have distinguished “global definitions” from “component definitions” for patient-based outcome measures.4 Global definitions define well-being in general terms such as global judgments of health or satisfaction with life, whereas component definitions break the concept into specific parts or dimensions. They have proposed the following classification of components: physical function, symptoms, psychologic well-being, social well-being, cognitive functioning, role activities, personal constructs (ie, life satisfaction, spirituality, etc), and satisfaction with care.
Measuring well-being poses challenges that are not apparent with more objective clinical outcome measurements. One of these challenges is measuring differences between individuals at a single point in time versus changes within individuals or groups over time. A discriminative instrument asks questions such as: who, at this point, has better QOL and whose QOL is not so good versus an evaluative instrument which asks who has improved more, who has improved less, and who has deteriorated. The latter is most frequently used in clinical trials. The focus of this article is to classify HRQOL scales as disease-specific versus generic. Disease-specific scales are designed to be used for a specific condition, with a scale for arthritis having different questions than a scale for heart failure. Disease-specific scales have typically been designed to identify aspects of a disease most likely to improve with therapy and thus will maximize a patient’s responsiveness to change while receiving a particular therapy. All pivotal clinical trials of therapies now include an instrument to assess patient-reported outcomes, usually the Health Assessment Questionnaire (HAQ) or the Modified HAQ (MHAQ). Generic scales are designed to be applicable across many conditions and focuses on overall QOL (ie, overall physical, social, and emotional health). Because these are not tailored to a specific disease, generic scales are much less likely to be responsive to change in intervention trials. However, a number of trials of RA have included a generic scale, the Medical Outcomes Study 36-Item Short Form (SF-36).
Measurement Used in the Assessment of Quality of Life in Rheumatoid Arthritis Trials Table 1 shows the most commonly used HRQOL instruments in RA trials and their psychometric properties.5
The HAQ and the MHAQ are the most widely used.6 The HAQ Disability Index (HAQ-DI) is an ordinal scale with 20 items on daily functioning during the past week. These cover 8 component areas: dressing and grooming, arising, eating, walking, hygiene, reach, grip, and outdoor activities. The scale is either self-administered or may be applied in a personal or telephone interview. It can be completed in 5 minutes. Each response is scored on a 4-point scale of ability: without any difficulty, with some difficulty, with much difficulty, and unable to do.
There are a number of generic scales7; the SF-36 and its derivatives dominate the field in the majority of clinical areas, including RA trials. The SF-36 was designed as a generic indicator of health status for use in population surveys and evaluative studies of health policy, and is only more recently being used to complement disease-specific measures in clinical trials.8 The SF-36 has 36 questions that measure the following 8 dimensions: physical functioning, physical role limitations, bodily pain, social functioning, general mental health, social role limitations, vitality, and general health perceptions. The standard instrument uses a 4-week recall period but is often used with a 1-week recall period. It may be self-administered or used in personal or telephone interviews. It takes 5 to 10 minutes to complete. Two summary scores are calculated, one for the physical component summary (PCS) and one for the mental component summary (MCS) scores. The focus of this article is to review the performance of the SF-36 against the Outcomes Measures in Rheumatology (OMERACT) Filter of Validity, Feasibility, and Responsiveness to Change.
What Is OMERACT?OMERACT is an international, informally organized network initiated in 1992 that aims to improve outcome measurement in rheumatology. Consensus conferences, which are chaired by an executive committee, are held every 2 years and rotate around the globe. Data-driven recommendations are prepared and updated by expert working groups at these consensus conferences. Recommendations include core sets of measures for most of the major rheumatologic conditions.9
What Does OMERACT Do?
When Is a Measure “Applicable”? A measure is considered “applicable” when it passes the OMERACT filter in its intended setting. The OMERACT filter can easily be summarized in only 3 words: truth, discrimination, and feasibility.12 Each word represents a question to be answered of the measure, in each of its intended settings.
Truth. Is the measure truthful? Does it measure what it intends to measure? Is the result unbiased and relevant? This criterion captures the issues of face, content, construct, and criterion validity.
Discrimination. Does the measure discriminate between 2 situations that are of interest? The situations can be states at one time (for classification or prognosis) or states at different times (to measure change). This criterion captures the issues of reliability and sensitivity to change.
Feasibility. Can the measure be applied easily, given constraints of time, money, and interpretability? This criterion addresses the pragmatic reality of the use of the measure, one that may be decisive in determining a measure’s success.
The SF-36 has already been demonstrated to meet the first and third criteria above. It has been extensively validated. Face validity is greatest for the component scores, which are much easier to interpret and use for clinical decision making than the 8 subscales. Content, construct, and criterion validity have been demonstrated, and it has been shown to meet the feasibility criteria in many trials in many conditions.7 The outstanding challenge that this article addresses is to demonstrate its discrimination in RA through showing its responsiveness to change in trials against placebo.
SF-36 Comparison With Other Scales
Inclusion Criteria. For inclusion in this analysis, the identified studies had to be double-blind, randomized controlled trials that examined the effectiveness of 1 or more interventions targeting patients with RA. Studies needed to include as reported outcomes the SF-36 with their component scores, the HAQ, and tender joint count (TJC).
Exclusion Criteria. Studies were excluded when a control group was absent and when studies reported data on the same, or highly related, sample. In these cases (Table 2)13-39 the study with the most complete data on the SF-36, HAQ, and TJC was reported. Trials were also excluded when the published report did not contain adequate data and the data could not be obtained from the original authors.
Identification of Relevant ArticlesA total of 7 studies were identified through 2 searches and consultation with an expert in the field. The first search on PubMed retrieved 93 studies using keywords rheumatoid arthritis and SF-36 from 1998 to the present. A second search with a wider catchment area was run in MEDLINE, the Cochrane CENTRAL, and EMBASE databases using the Ovid platform. This search was created using Medical Subject Headings (MeSH) of the National Library of Medicine for RA using textword searching to identify other variations. The search was then limited through a filter to identify randomized controlled trials.
Outcome Measures. The primary outcome measures evaluated by this review were the MCS and PCS of the SF-36. The scores can range from 0 to 100, with higher scores indicating better QOL. Comparators were one or more of the following: the HAQ scores, TJC, the disease activity score, and the American College of Rheumatology (ACR) Responder Index (ACR20, ACR50, and ACR70).
Statistical Methods. The ability to detect a treatment effect in the study outcomes was evaluated using 3 measures: treatment difference—difference between the mean change in the treatment group and mean change in the placebo group; standardized response mean (SRM)—ratio of the treatment difference to the pooled standard deviation (SD) of the mean change in scores; and relative efficiency (RE) in relation to the TJC—square of the ratio of the t statistics which corresponds to squaring the ratio of the SRM for the outcome to the SRM for the TJC. An RE >1 would imply that the outcome is more efficient than the TJC in detecting a treatment effect. ma and sa are the mean and SD, respectively, of the change in scores from baseline in the abatacept group, and na is the number of patients in this group. Similarly, mc , sc , nc are the corresponding values for the control group.
Thirty-five of the identified studies met the criteria for inclusion after the screening of the results for both the searches (Table 3).37-46 Of these identified 35 studies, 5 studies were included after the criteria of exclusion was applied. These results were further supplemented by an expert in the field who has compiled a bibliography of all SF-36 studies, totaling more than 10 000 published studies to date, and 2 additional studies were identified for a total of 7.
DiscussionThis article has summarized the evidence for the responsiveness to change of the SF-36 component scores across 7 placebo-controlled trials as assessed by SRM and effect size— 2 key measures for meeting the OMERACT criteria for responsiveness to change. As expected, the PCS showed more responsiveness to change from nonsteroidal antiinflammatory drugs and biologics than did the MCS, given that mobility and pain are more heavily weighted in the PCS. The PCS performed in a similar fashion to the HAQ, so that sample sizes adequate for the HAQ can be expected to have sufficient power for the PCS. However, much larger sample sizes would be needed to achieve adequate statistical power if the MCS is an important outcome.
Others have assessed responsiveness to change of the SF-36 components but have only studied 1 or 2 trials and, with the exception of Wells et al,39 have used other indices. In 2 trials of misoprostol and diclofenac sodium versus placebo, Kosinski et al41 found that changes in the SF-36 and HAQ scores were more strongly related to changes in the patient and physician global assessments and patient pain assessment than to changes in the joint swelling and tenderness counts.
Eichler et al,13 in 2 placebo-controlled clinical trials that compared etoricoxib, naproxen, and placebo in 1684 patients groups, found that although the correlation with the joint scores was low, the association of clinical efficacy end points was nearly identical for the HAQ overall score and the SF-36 PCS.
Tuttleman et al,14 evaluated the SF-36 as a generic functional health status measure in 207 patients in the Minocycline in Rheumatoid Arthritis trial. The SF-36 had high internal consistency and reliability and high discriminant and convergent validity. Moderate correlations were observed for comparable items on the SF-36 and MHAQ regarding dressing, walking, and bending. Joint tenderness score correlations with items on the MHAQ and SF-36 scales were higher than for joint swelling scores. Physician and patient global assessments were most highly correlated with the SF-36 bodily pain item. Based on the data from this study, the authors have confirmed that the SF-36 is a valid instrument for patients with RA and that the SF-36 correlates with the MHAQ and the physician and patient global assessments.
These studies show that the SF-36 deserves serious consideration for inclusion in the core set of outcomes recommended for future trials. This will also expand the database on its performance. As a generic QOL measure, the SF-36 is better suited to capture the holistic health of the patient as reflected in the World Health Organization definition of health as being not only the avoidance of disease but the physical, emotional, and social well-being of the patient. Furthermore, use of the SF-36 permits comparisons of physical and mental aspects of QOL in the RA patient population, as well as comparisons of QOL parameters between patients with RA, other patient groups, and the general population. This contribution is unique and of added value when issues of QOL are important.
3. Spitzer WO. State of science 1986: quality of life and functional status as target variables for research. J Chronic Dis. 1987;40:465-471.
5. Russak S, Croft J Jr, Furst D, et al. The use of rheumatoid arthritis health-related quality of life patient questionnaires in clinical practice: lessons learned. Arthritis Rheum. 2003;49:574-584.
7. McDowell I. Measuring Health: A Guide to Rating Scales and Questionnaires. 3rd ed. New York, NY; Oxford University Press; 2006.
9. What is OMERACT? www.omeract.org. Accessed October 31, 2007.
11. Brooks P, Boers M, Simon L, Strand V, Tugwell P. Outcome measures in rheumatoid arthritis-the OMERACT process. Expert Rev Clin Immunol. 2007;3:271-275.
13. Eichler HG, Mavros P, Geling O, Hunsche E, Kong S. Association between health-related quality of life and clinical efficacy endpoints in rheumatoid arthritis patients after four weeks treatment with anti-inflammatory agents. Int J Clin Pharmacol Ther. 2005;43:209-216.
15. Bilberg A, AhlmÃ©n M, Mannerkorpi K. Moderately intensive exercise in a temperate pool for patients with rheumatoid arthritis: a randomized controlled study. Rheumatology (Oxford). 2005;44:502-508.
17. Genovese MC, Becker JC, Schiff M, et al. Abatacept for rheumatoid arthritis refractory to tumor necrosis factor alpha inhibition. N Engl J Med. 2005;353:1114-1123.
19. Helliwell PS, Oâ€™Hara M, Holdsworth J, Hesselden A, King T, Evans P. A 12-month randomized controlled trial of patient education on radiographic changes and quality of life in early rheumatoid arthritis. Rheumatology (Oxford). 1999;38:303-308.
21. Kaplan RM, Groessl EJ, Sengupta N, Sieber WJ, Ganiats TG. Comparison of measured utility scores and imputed scores from the SF-36 in patients with rheumatoid arthritis. Med Care. 2005;43:79-87.
23. Kosinski M, Kuyawski S, Martin R. Health-related quality of life in early rheumatoid arthritis: impact of disease and treatment response. Am J Manag Care. 2002;8:231-240.
25. Mathias SD, Colwell HH, Miller DP, Moreland LW, Buatti M, Wanke L. Health-related quality of life and functional status of patients with rheumatoid arthritis randomly assigned to receive etanercept or placebo. Clin Ther. 2000;22:128-139.
27. Scott DL, Garrood T. Quality of life measures: use and abuse. Baillieres Best Pract Res Clin Rheumatol. 2000;14:663-687.
29. Strand V. Longer term benefits of treating rheumatoid arthritis: assessment of radiographic damage and physical function in clinical trials. Clin Exp Rheumatol. 2004;22(5 suppl 35):S57-S64.
31. Strand V, Scott DL, Emery P, et al. Physical function and health related quality of life: analysis of 2-year data from randomized, controlled studies of leflunomide, sulfasalazine, or methotrexate in patients with active rheumatoid arthritis. J Rheumatol. 2005;32:590-601.
33. Torrance GW, Tugwell P, Amorosi S, Chartash E, Sengupta N. Improvement in health utility among patients with rheumatoid arthritis treated with adalimumab (a human anti-TNF monoclonal antibody) plus methotrexate. Rheumatology (Oxford). 2004;43:
34. Tugwell P, Wells G, Strand V, et al. Clinical improvement as reflected in measures of function and health-related quality of life following treatment with leflunomide compared with methotrexate in patients with rheumatoid arthritis: sensitivity and relative efficiency to detect a treatment effect in a twelve-month, placebo controlled trial. Leflunomide Rheumatoid Arthritis Investigators Group. Arthritis Rheum. 2000;43:506-514.
36. Wolfe F, Michaud K.Towards an epidemiology of rheumatoid arthritis outcome with respect to treatment: randomized controlled trials overestimate treatment response and effectiveness. Rheumatology (Oxford). 2005;44(suppl 4):iv18-iv22.
38. Westhovens R, Cole JC, Li T, et al. Improved health-related quality of life for rheumatoid arthritis patients treated with abatacept who have inadequate response to anti-TNF therapy in a doubleblind, placebo-controlled, multicentre randomized clinical trial. Rheumatology (Oxford). 2006;45:1238-1246.
40. Cohen SB, Emery P, Greenwald MW, et al. Rituximab for rheumatoid arthritis refractory to anti-tumor necrosis factor therapy: results of a multicenter, randomized, double-blind, placebocontrolled, phase III trial evaluating primary efficacy and safety at twenty-four weeks. Arthritis Rheum. 2006;54:2793-2806.
42. Kremer JM, Genovese MC, Cannon GW, et al. Concomitant leflunomide therapy in patients with active rheumatoid arthritis despite stable doses of methotrexate. A randomized, doubleblind, placebo-controlled trial. Ann Intern Med. 2002;137:726-733.
44. Maini RN, Breedveld FC, Kalden JR, et al. Sustained improvement over two years in physical function, structural damage, and signs and symptoms among patients with rheumatoid arthritis treated with infliximab and methotrexate. Arthritis Rheum.
45. Zhao SZ, Fiechtner JI, Tindall EA, et al. Evaluation of healthrelated quality of life of rheumatoid arthritis patients treated with celecoxib. Arthritis Care Res. 2000;13:112-121.