The American Journal of Managed Care February 2015
A Systematic Review of Measurement Properties of Instruments Assessing Presenteeism
Maria B. Ospina, PhD; Liz Dennett, MLIS; Arianna Waye, PhD; Philip Jacobs, DPhil; and Angus H. Thompson, PhD

A systematic review of presenteeism instruments found that most have been validated to some extent, but evidence for criterion validity is virtually absent.
Presenteeism (decreased productivity while at work) is reported to be a major occupational problem in many countries. Challenges exist for identifying the optimal approach to measure presenteeism. Evidence of the relative value of presenteeism instruments to support their use in primary studies is needed.

To assess and compare the measurement properties (ie, validity, reliability, responsiveness) and the quality of the evidence of presenteeism instruments.

Study Design
Systematic review.

Comprehensive searches of electronic databases were conducted up to October 2012. Twenty-three presenteeism instruments were examined. Methodological quality was appraised with the COSMIN (COnsensus-based Standards for the selection of health status Measurement INstruments) checklist. A best-evidence synthesis approach was used in the analysis.

The titles and abstracts of 1767 articles were screened, with 289 full-text articles reviewed for eligibility. Of these, 40 studies assessing the measurement properties of presenteeism instruments were identified. The 3 presenteeism instruments with the strongest level of evidence on more than 1 measurement property were the Stanford Presenteeism Scale, 6-item version (content validity, internal consistency, construct validity, convergent validity, and responsiveness); the Endicott Work Productivity Scale (internal consistency, convergent validity, and responsiveness); and the Health and Work Questionnaire (HWQ; internal consistency and structural validity). Only the HWQ was assessed for criterion validity, with unknown quality of the evidence.

Most presenteeism instruments have been examined for some form of validity; evidence for criterion validity is virtually absent. The selection of instruments for use in primary studies depends on weak forms of validity. Further research should focus on the goal of a comprehensive evaluation of the psychometric properties of existing tests of presenteeism, with emphasis on criterion validity.

Am J Manag Care. 2015;21(2):e171-e185
This article reviewed studies assessing the measurement properties of presenteeism instruments.
  • We identified 40 studies assessing the measurement properties of 21 presenteeism instruments.
  • Most presenteeism instruments have been assessed for some form of validity but evidence for criterion validity is virtually absent.
  • Within these limitations, the 3 presenteeism instruments with the strongest level of evidence were the Stanford Presenteeism Scale, 6-item version (SPS-6); the Endicott Work Productivity Scale (EWPS); and the Health and Work Questionnaire (HWQ).
  • The selection of a presenteeism instrument for research and management purposes currently depends on weak forms of validity.
Presenteeism has been broadly defined as “decreased productivity and below-normal work quality” when physically present at work.1 Presenteeism can be studied in relation to many factors, including health. Terms such as “impaired presenteeism,” “sickness presenteeism,” or “working through illness”2 describe a phenomenon involving less than full productivity because of illness or other health conditions among individuals who opt to come to work, even when they arguably could stay at home.3

Presenteeism is a major occupational health problem in many countries, with serious consequences for both organizations and employees. Increasing evidence shows that presenteeism represents a “silent” but significant source of productivity losses that can cost organizations much more than absenteeism does.3 Presenteeism can lead to an increase in occupational accidents, deterioration of product quality, and adverse effects on healthy employees.3 The impact for the individual is not less—employees who turn up for work when ill have their quality of life diminished. They often experience feelings of burnout due to inadequate recovery, and get trapped in a vicious circle: job demands are accumulated, they have less energy to cope with these demands, more presenteeism results, and so on. Similarly, by repeatedly postponing sickness leave that may effectively resolve minor illnesses, more serious illnesses may develop.

A number of systematic reviews have summarized the measurement properties of instruments that assess productivity loss at the workplace,4-7 work productivity combining presenteeism and absenteeism measures,8 or work-related outcome measures in specific clinical groups.9,10 The majority of these reviews have not incorporated a systematic analysis of the methods by which these instruments have been developed,4,5,7 or have employed nonvalidated approaches to appraise both the quality of studies and the measurement properties of presenteeism instruments themselves. 6 Assessing the quality of studies that evaluate the measurement properties of presenteeism instruments is an essential step to inform the selection of presenteeism instruments for research and practice. If the quality of a study is appropriate, the results are valid and the measurement instrument can be a useful tool. Conversely, if the study quality is inadequate, the results cannot be trusted and the quality of the measurement instrument under scrutiny remains unclear.

Similarly, evidence on the relative value of presenteeism instruments is needed. Concurrent comparisons of the measurement properties and quality of presenteeism instruments may help to reveal the relative strengths and weaknesses of the measures and provide evidence-based guidance for the selection of instruments for research and management. To fill these knowledge gaps, we conducted a systematic review to assess and compare the measurement properties (ie, validity, reliability, responsiveness) and the quality of the evidence of presenteeism instruments.


Data Sources

We conducted comprehensive electronic searches of MEDLINE, Embase, Cochrane Central Register of Controlled Trials, PsycINFO, Web of Science, Cumulative Index to Nursing and Allied Health Literature, Business Source Complete, and ABI/INFORM from database inception to October 2012 for studies reporting psychometric properties of instruments assessing presenteeism. Two reviewers (AHT, AW) compiled a list of presenteeism instruments based on preliminary searches of the literature, contact with experts in the field, and examination of individual items. An information specialist (LD) designed the search strategy using the names of instruments as keywords. In addition, we examined the references of identified articles for additional studies. Searches were limited to citations in the English language.

Article Selection

Presenteeism instruments were defined in this systematic review as questionnaires measuring at least 1 domain of productivity loss or reduced productivity/performance while at work. Items assessing presenteeism were required to have focused on at least 1 of the following characteristics: a) perceived productivity loss/reduced performance, b) comparative productivity loss/reduced performance (compared with those of others and with one’s pattern), and/or c) estimation of unproductive time while at work. Based on this definition, 21 presenteeism instruments were considered for the review. Studies included in the review were full-text, peer-reviewed primary studies that evaluated the measurement properties (ie, validity, reliability, responsiveness) of the English version of any of the presenteeism instruments listed in Table 1. Tests are identified herein by their acronyms, a potentially confusing array. To minimize the effects of this, the full name of each, with the acronym in parentheses, is presented on first appearance. In addition, Table 1 can serve as a glossary. Having said that, the instruments under study are serving as members of a class of tests, and as such, are like subjects in most empirical research, with no need to identify individuals when considering most of the issues under discussion.

No restrictions in study design were applied during article selection; however, editorials, book chapters, review articles, conference abstracts, unpublished studies, case studies with fewer than 30 cases, and studies enrolling only pediatric populations (subjects 18 years and younger) were excluded. Studies published in non-English languages, or studies published in English that reported the cross-adaptation of instruments or the measurement properties of non-English versions of presenteeism instruments, were not considered for inclusion in the review.

Two reviewers (AW, MO) independently screened the titles and abstracts generated from the search strategy. The full text of articles deemed relevant and those with abstracts and titles that provided insufficient information were retrieved for a closer inspection. Two independent reviewers (either AW and LD, or AW and MO) determined study eligibility for the review, with disagreements about inclusion and exclusion of studies being solved through consensus among reviewers. Reasons for exclusions were documented and a flow chart of study selection was prepared according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement.

Methodological Quality Assessment

Two reviewers (either AW and LD, or AW and MO) independently applied the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) checklist11 to assess the methodological quality of the measurement property reported in the included studies, with disagreements among reviewers being solved by consensus. The COSMIN checklist was developed through an international Delphi study12 with the specific goal of facilitating the methodological assessment of outcome measures to enable the selection of the best instrument for a specific purpose.11 The checklist includes the following properties: reliability, internal consistency, content validity, construct validity, criterion validity, and responsiveness (see definitions in Table 2). Briefly, the quality of each measurement property reported in a study is assessed by a series of items including design requirements and preferred statistical methods and rated on a 4-point rating scale (poor, fair, good, excellent) depending on the information reported by the authors. A total score is determined by taking the lowest rating of the items for each measurement property.11 The COSMIN checklist is increasingly used in systematic reviews of measurement properties, and to date, it is the only quality assessment tool of this kind that has been validated and standardized.11,13

Data Extraction and Synthesis

One reviewer (MO) extracted the following information from the studies: health condition and sample size in which the instrument was tested; validity (ie, content, construct, criterion, convergent); reliability (internal consistency, test-retest, inter-rater); and responsiveness data. A second reviewer (AW) independently verified the accuracy and completeness of data extraction, with discrepancies between the data extractor and the data verifier being resolved by consensus.

We used a best-evidence synthesis approach to summarize the evidence on the measurement properties of presenteeism instruments, taking into account the number of studies, quality ratings, and consistency across their results. For each instrument, we combined the results of the methodological quality assessment of individual studies (poor, fair, good, or excellent) with a composite rating of the level of the evidence for the measurement properties of each instrument. The resulting level of evidence for the measurement properties of each instrument was classified for each property according to the following criteria: 1) strong (ie, consistent positive findings from multiple studies of good methodological quality or in 1 study of excellent methodological quality); 2) moderate (ie, consistent positive findings from multiple studies of fair methodological quality or in 1 study of good methodological quality); 3) limited (ie, positive findings from 1 study of fair methodological quality); and 4) conflicting (ie, conflicting findings in individual studies).14 When there were only studies of poor methodological quality, an unknown level of evidence was noted.


Our searches identified 1767 citations, with 971 duplicates being removed. Titles and abstracts of the remaining 796 references were screened for relevance, yielding 289 articles judged as potentially relevant for the review. After applying the eligibility criteria to the full-text versions of these studies, we identified 40 studies that evaluated the measurement properties of 21 presenteeism instruments (Figure). The complete list of excluded studies and reasons for exclusion is available upon request.

General Characteristics of the Studies

The 40 studies15-54 examined the measurement properties of 18 of the 21 presenteeism instruments included in the review. We did not identify studies that evaluated the measurement properties of the Osterhaus technique, the Work Role Functioning Questionnaire, or the Stanford/American Health Association Presenteeism Scale, 32-item version. While it may seem counterintuitive to have retained tests that had no representation in the peer-reviewed literature on psychometric properties, please note that these 3 did meet the criteria for inclusion in the study (they were deemed to measure some aspect of presenteeism). As such, an inability to find studies dealing with test quality is a separate matter and an important finding. Measurement data, on the other hand, were identified for 6 parallel forms of the Work Productivity and Activity Impairment (WPAI) scale. These were treated as separate entities, thus bringing the total number of presenteeism instruments that were ultimately examined to 23.

