When analyzing the effect of workplace wellness and related employee health services, studies invariable attribute savings among participants to the program rather than the more likely
The vast majority of “studies” in the field of workplace wellness and related employee health services compare participants to nonparticipants, and show substantial savings in the former vs the latter. They invariably attribute the savings among the participants to the program (a “program effect”) rather than to the likely much higher level of motivation among participants to succeed in any program (the “participation effect”).
For example, wellness promoters claim that if you divide a company into employees who want to lose weight vs employees who don’t, that the difference in weight loss between the 2 groups is due to the program, not the relative difference in motivation to lose weight. (And, further, that people who start in the motivated category but drop out shouldn’t count at all.)
If indeed it is the case that the “participation effect” is present and substantial, savings from all studies with that design are overstated. And yet despite its ubiquity in wellness study designs, no one has ever postulated the existence of a participation effect using data. This is particularly perplexing because, intuitively, it makes no sense that, in an undertaking such as personal health improvement in which motivation is key, separating a population into study and control groups on the basis of motivation would constitute a valid study design.
Unsurprisingly, it is easily provable that the intuitive result is indeed the correct result.
How do we know this? There have been 3 studies in which the participation effect can be isolated from the program effect. Further, each study featured the opposite of “investigator bias,” in that the authors were attempting to prove (and, indeed, thought they had proven) the opposite.
Each study demonstrated the substantial impact of the participation effect from different angles:
Case Study 1
The slide below—the key outcomes display of the Eastman Chemical/Health Fitness Corporation’s Koop Award-winning program application—clearly shows increasing savings over the period of the 2 “baseline years” (2004 and 2005 on the slide below) before the “treatment years” (2006 to 2008) got underway. Phantom savings reached almost $400/year/employee by 24 months (2006), even though the program was not available to would-be participants until the 24th month.
While this study would seem to constitute an excellent demonstration of the “participation effect,” there is one limitation: 4 years after the study was published, reviewed, and blessed by the Koop Award Committee, the authors and evaluators removed the X-axis altogether and presented an alternative: that the program had been in effect during the entire period. The revised slide is identical except that the X-axis no longer has labels.
That revision raises another question, though: by the end of 2008, the “savings” for Eastman participants exceeded $900/year, or 24%, but average participant risk declined only 0.17 on a scale of 5, or roughly 3%. And since wellness-sensitive medical admissions account for roughly 7% of all admissions, that 3% would be applied only to 7% of all admissions, thus accounting for only 0.2% of the 24% of savings.
Even if one accepts that—despite its review by the entire Koop Award Committee and its subsequent wide dissemination—the key display was wrong the entire time and no one noticed until it was exposed in a highly visible Health Affairs blog, the 24% separation of the 2 lines is overwhelmingly the result of the participation effect. It cannot possibly be attributed to the 3% reduction in risk factors among motivated participants.
Case Study 2
By encouraging participating diabetics and people at risk for diabetes to (among other things) eat less fat and, hence, likely more carbohydrates, Stanford University researchers showed a relative reduction in costs of $397 in the first 6 months alone vs nonparticipants. (Because this study was conducted in the 1990s, it modeled only $1000/day as the hospital per diem. The same claimed reduction in hospital utilization multiplied by a more current per diem would generate a far greater dollar savings.)
Today, recommending that people at high risk for diabetes eat less fat would be considered controversial, and no one would attribute substantial near-term savings to that recommendation. In general, it is also highly unlikely that substantial savings could be achieved within the first 6 months with any intervention reliant on long-term education and behavior change.
Further, and unsurprisingly, behaviors hardly changed anyway. Risk scores improved by only 2% vs control. Even that 2% might be an overstatement: the control group itself was composed of people with much lower risk scores than the study group and it was assumed that there was no regression to the mean in the high-risk group. However, subsequent research has demonstrated a “natural flow of risk” that would predict a greater reduction in risk for high-risk groups than low-risk groups. This makes intuitive sense because lower-risk people have less opportunity to reduce risk. For example, smokers can quit smoking to reduce their risk score and obese people can lose weight, while nonsmokers can’t quit smoking and thin people shouldn’t lose weight.
Since participant risk factors declined only 2% (vs. the control) likely due largely to regression to the mean, and since much of the advice was arguably wrong, one should conclude that the massive savings in the first 6 months—probably exceeding $1000 in today’s dollars—can be attributed to the participation effect rather than the dietary and other changes by 2% of the study group.
Case Study 3
This was a controlled experiment in which 2 groups of fairly low-risk people were “invited,” to be compared as a whole with a control group—a classic and face-valid randomized control trial (RCT). For simplicity, these 2 groups will be designated “Control” and “Invited.” Invited was offered the program, and 14% signed up as participants, while 86% of Invited did not participate. At the end of the year, there was only a trivial difference in health status measures between Control and Invited. In the latter 3 of the 6 measures, Control outperformed Invited. As an RCT, it can be concluded that there was no difference, meaning no impact of the intervention.
Nor should there have been any difference. Subjects were Invited specifically because they were not chronically ill. Hence there was no opportunity for improve health status in a clinically meaningful way, especially in only 12 months. Further, the intervention was based on the highly questionable proposition that telling people they had a gene for obesity would motivate them to lose weight.
Let’s focus now just on the 2 Invited subsets—the participants and the nonparticipants.
Despite the overall result showing no separation between Control and Invited, the participants within Invited incurred $1464 less cost than the nonparticipants in Invited during the first year alone. This industry-record alleged savings was achieved by a DNA-based intervention whose efficacy is questioned by many researchers anyway. Just to be clear: once both subgroups of Invited were totaled and compared to Control, any health improvement—and hence any savings attributable to health improvement—in Invited went away.
Hence, the $1464 in savings is purely the participation effect at work, a completely compelling “natural experiment.”
Following publication in the Journal of Occupational and Environmental Medicine, editorial advisor Nortin Hadler apologized for allowing the authors to show $1464 in savings when there was no population-wide health improvement impact whatsoever to attribute the savings to. (See “Comments.”)
Cases Indicate a Pattern in Wellness
These are just the 3 most self-evident cases. But there is a pattern in wellness, especially among award-winning programs: risk factors among participants barely budge, but huge amounts of savings are alleged compared with nonparticipants.
The implication of these cases and this pattern? Listing “unobservable differences” between participants and nonparticipants as a limitation is no longer a sufficient disclaimer when reporting or publishing favorable results of wellness studies in which the former uses the latter as a control. A better description would be “invalidator.” At a minimum, investigators should point readers to this posting and suggest that they draw their own conclusions about validity.