A new study represents an existential threat to conventional wellness programs. Here's a behind-the-scenes look at the study's findings.
Some tourist attractions feature an “A” tour for newbies and then a “behind-the-scenes” tour for those of us who truly need lives. For instance, I confess to having taken Disney’s Magic Kingdom underground tour, exploring, among other things, the tunnels through which employees travel so as not to be seen out of costume in the wrong “Land.”
Likewise, there have been many reviews of the recent wellness study conducted by the National Bureau of Economic Research (NBER), the first-ever randomized control study of a wellness program. This, however, is the first review to go beyond the “A” tour of the headlines.
By way of background, the headline is that the mainstream wellness program the investigators examined at the University of Illinois did not noticeably move the needle on employee health. They didn’t address return-on-investment (ROI), because there obviously was none. Achieving a positive ROI would require moving the health risk needle—not just by a little, but rather by enough to significantly improve the health of many employees. Then, since wellness-related events, such as heart attacks, would not otherwise have befallen these employees immediately, this improvement would have to be sustained over several years before there was a statistical chance of some events being avoided.
Finally, the magnitude of this improvement would have to be great enough to violate the rules of arithmetic, because it is not mathematically possible to avoid enough medical events to break even on wellness. For instance, it actually costs about $1 million to avoid a heart attack through a screening program.
This finding, therefore, represents an existential threat to conventional wellness programs.
No surprise, then, that while virtually all the reviews of this study have been positive, wellness apologists have tried to discredit it 2 ways. First, they would say: “Yes, but it’s only 1 year.” That is true, but there is no reason to expect a “hockey stick” improvement in Year 2, when in Year 1 employees didn’t even increase their visits to the gym in Year 1, which would have been the easiest behavior improvement. As the lead researcher said: “We don’t see anything trending towards savings.”
Next, detractors would, ironically, say: “Everyone knows savings don’t start in the first year.” And yet the Koop Awards—the award given to the best-in-country wellness program by a group of wellness promoters that licensed the name of former Surgeon General C. Everett Koop, MD, (which had a “For Sale” sign on it later in his life)—almost invariably go to programs that show massive first-year savings by comparing active motivated participants to non-participants. In one award-winning case, those massive savings started 2 yearsi before the program was implemented.
Detractors would also say: “This study was misdesigned because they lumped nonparticipants and participants together. Obviously, nonparticipants would have no interest in this program.” That, of course, is a feature of this study design, not a bug. This study design controls for motivation, in order to isolate the program effect from the self-selection bias, such as the bias evident in that very same award-winning study.
All these negative reviews took place in LinkedIn groups that are generally closed to the public, and also to wellness critics. Despite multiple requests to comment, the Health Enhancement Research Organization (HERO, the wellness industry trade association) has offered no rebuttal, response, or even acknowledgement of the existence of this study. Perhaps it fears that its comments will draw more attention to the original, not just from trade publications, but also from right-wing or left-wing media outlets, neither of which have been supportive of workplace wellness.
The study could be criticized on 1 key point, a flaw that no detractor spotted. The 2 groups were compared on the basis of change in total cost, rather than (or in addition to) the change in the total number of wellness-sensitive medical events (WSMEs). The “gold standard” methodology of tallying WSMEs is described in my book, Why Nobody Believes the Numbers (chapter 2), in a seminal Health Affairs research study, and in HERO’s own outcomes measurement guidebook (chapter 2). It is required by the Validation Institute for outcomes validation for employee- and member-facing organizations by the Validation Institute.
The reason to focus on WSMEs is that being screened for heart disease and diabetes, providing information on one’s diet and exercise, and having access to a gym could not be expected to reduce total costs, but rather just the cost of avoided WSMEs. Wellness programs aren’t going to get “medical costs to decrease significantly for neoplasms and digestive systems [sic], and for blood and blood-forming organs,” though a widely cited study made that exact claim. __________________________________________________________________________
i The separation of participants (green line) and non-participants (orange line) took place in 2004. The program started in 2006. By the time the program started, participants had already “saved” almost $400, without even having a program to participate in.
The behind-the-headlines tour of the NBER study yields the most important conclusion
The focus on the headlines has obscured the most important conclusion from this study, which is that the entire participant vs nonparticipant methodology is invalid. Until now, that methodology had been 1 of the 2 methodologies the industry uses to reliably “show savings” in virtually any circumstance.ii
Of the 2, the participant vs nonparticipant methodology is the only one that arguably needs scientific proof along with mathematical invalidation. The mathematical invalidation has already been addressed on The American Journal of Managed Care blog. That study was a meta-review of 3 participant vs nonparticipant studies, which had a known benchmark—including the 1 highlighted above where the “benchmark” is that a wellness program can’t show savings before it exists. Therefore, the known benchmark is 0% savings on participants, invalidating the participants vs nonparticipants result showing almost 20% savings.
The best confirmation of any initial conclusion is an analysis reaching a similar or identical conclusion—but conducted using a completely different approach, by completely different researchers, with no relationship to the original researchers, and no confirmation or investigation bias.
This study satisfied those criteria. It was a prospective randomized control study aiming for a scientific outcome, as opposed to the previous study, a retrospective meta-review focused on arithmetic. Along with invalidating the methodology generally, the researchers specifically invalidated 78% of the studies that comprised the “Harvard study” meta-analysis, whose very widely publicized 3.27-to-1 ROI turbocharged growth of this field.iii
Invalidating this particular meta-analysis—cited 775 times in the academic literature alone—undermines the entire industry. I, myself, have invalidated it a different way, noting that the studies that underlie it (the majority of which used data from the 1990s and were authored by executives with strong ties to the wellness industry) were themselves riddled with obvious fallacies (including the fallacy that the aforementioned study can’t attribute a reduction in the cost of diseases of blood-forming organs to an intervention that had nothing to do with blood-forming organs). Yet, despite the obviousness of the fallacies, I could be accused of “investigator bias,” due to my history of exposing ethical and economic lapses in this field.iv
This NBER case was just the opposite. The principal investigator, Damon Jones, PhD, is an associate professor at the University of Chicago Harris School of Public Policy, where the author of the Harvard study, Katherine Baicker, PhD, is now a dean. This juxtaposition confers a rarity of rarities—reverse “investigator bias.” A researcher must be quite confident of his findings in order to publicly invalidate his supervisor’s most often-cited paper.
Where does this leave the wellness industry?
It is safe to say that any debate on wellness savings is over: conventional wellness loses money in the commercially insured working-age population (also, according to a February Medicare announcement, in the senior population, as well).
And just as the unique reporting relationship between the 2 opposing principal investigators lends great credibility to the most recent finding, another unique aspect of this ongoing debate lends great credibility to the conclusion of this article: I am offering a 7-figure reward to anyone who can show wellness doesn’t lose money. Despite loosening the rules in 2017 so that the burden of proof shifts to me, and agreeing that I can appoint only 1 of the 5 judges, no one has attempted to claim it.
ii The other is measuring the decline in risk of high-risk members while overlooking any increase in risk of low-risk members. This is pure regression to the mean. Wellsteps CEO Steve Aldana was compelled to admit as much when he was caught doing it. As a hypothetical illustration of this fallacy, suppose the only risk factor is smoking. Further assume every employee smokes for a year but also quit for a year, and then repeats the cycle. Using the Wellsteps methodology of measuring only high-risk employees would show a 100% decline in smoking every year, while the actual smoking rate in the company remains unchanged at 50%.
iii See Figure 8, p. 46, with p. 77 in the online-only version making specific reference to the Baicker meta-analysis http://www.nber.org/workplacewellness/s/IL_Wellness_Study_1.pdf.
iv As an example of such a lapse, the study also failed to disclose that 1 of the investigators had strong ties to the Obama administration, which, at the time the study was published, was attempting to build bipartisan support for the pending Affordable Care Act legislation. Allowing a 30% withhold for wellness was key to procuring the support of the Business Roundtable. (This linkage did not become widely known until 2013.)