Simple Errors in Interpretation and Publication Can Be Costly

Author(s)Dwight Barry, PhD, Lindsey R. Haas, MPH, Paul Y. Takahashi, MD

Listen

0:00 / 0:00

Key Takeaways

Abstracts may overstate results, leading to misinterpretations with significant real-world impacts on healthcare decision-making.
The study by Haas et al. faced criticism for overstating results and discrepancies between the abstract and findings.
Authors acknowledged a typographical error but defended their conclusions, emphasizing the ACG model's performance.
Comprehensive evaluation beyond abstracts is crucial for informed decision-making in healthcare management.

A recent AJMC study contained overstatements and small but importantly placed errors that have the potential to cause unwarranted on-the-ground cost problems.

Published literature should be more critically evaluated when the chances of safety, outcome, or cost implications are high

Abstracts can use overstated results or terms to drive readership and citation rates.

Since the abstract may be the only part of a paper that administrators read, errors in abstracts can have significant real-world impacts on decision making.

Errors in interpretation of results can also have significant real-world impacts on decision making that can ripple across the entire spectrum of healthcare management.

TO THE EDITORS:

The study by Haas et al¹ that compares different risk stratification methods is enlightening, although the authors have overstated their results and a discrepancy between the abstract and their results both contribute to the possibility that their study could be misinterpreted in costly ways.

Table 2

It is good practice to include CIs on all point estimates, and CIs are at least available for the main results presented in (on following page). But the authors fail to interpret those values, and as a result conclude—incorrectly— that the Adjusted Clinical Group (ACG) model is the best. Given the information available in the paper, it is correct to say that the results are consistent with there being no difference between ACG and Minnesota Tiering (MN). In fact, the paper’s figures (which lack CIs, unfortunately) and C statistic values suggest that any differences in outcomes for all models might be statistically distinct at times but are practically trivial. It is also questionable whether the C statistic is even appropriate for comparing models with different drivers, and whether it accurately portrays the receiver operating characteristic curve upon which it rests (it can’t).^2,3

The specific error in the abstract states that the C statistic for the ACG model for predicting the highest 10% of cost users is 0.81, but Table 2 shows that the correct value is 0.76 (95% CI, 0.75-0.76). Since this overlaps with the results for the MN model (0.74; 95% CI, 0.74-0.75), the ACG model is not clearly “superior to the others.” If the authors meant to refer to readmissions, again the overlap in CIs between ACG and MN show that they are essentially equivalent models in this outcome as well. The term “superior” is a value-laden word that can be easily overinterpreted without a close read of the results; its use here presents both clerical and conceptual errors.

It may well be that the ACG model is superior to these other models, but this paper did not demonstrate that—in fact, this study’s results actually support the idea that all of these models are practically equivalent. Academics might call that conclusion a “negative result” and decide not to publish because of the excess difficulty of getting it through review. But as an industry professional, I would consider “negative results” like these—when properly interpreted— “fiscally important results.”⁴

AJMC is read by thousands of on-the-ground healthcare industry workers, and time-constrained administrators often take abstracts at face value, having no time to evaluate the results themselves. In fact, this paper made its rounds among our administrators when it was published, and may (or may not) have contributed to our subsequent decision to purchase one of these models. Purchasing risk models is expensive, and internal switching, implementation, and training costs multiply that well beyond the cost of the model. I would ask that editors and reviewers make absolutely sure that authors’ conclusions are warranted by their results before publication—improper interpretation can be as damaging and/or costly as clerical errors when subjected to the decision-making hurricane that is today’s healthcare industry.

Sincerely,

Dwight Barry, PhD

RESPONSE:

The letter “Simple Errors in Interpretation and Publication Can Be Costly,” a response to the article “Risk-Stratification Methods for Identifying Patients for Care Coordination,” is primarily based on 1 line in the abstract. The author of the letter did identify a typographical error in the abstract which included the C statistic for readmissions instead of that for high-cost users (we have submitted an erratam to correct the error). Although we agree that the word “superior” could have been softened, the ACG did perform better than other models in predicting healthcare utilization. For example, as can be seen in Table 2 (below) of the original paper, for predicting the top 10% high-cost users, the ACG had non-overlapping CIs with all models except for MN Tiering, which is based on the ACG.

We believe the abstract was clear. Unfortunately, as abstracts are limited with regards to space, we would hope the reader would draw their impressions from the entire paper.â€‹ The conclusions of the paper stated that although the ACG was generally better at predicting utilization, all models had good concordance, suggesting that choosing any model would be more beneficial than none. All models, excluding ACG and MN Tiering, are free and publically available.

We agree with the correspondent that receiver operating characteristic curves should not be the only basis of comparison between models. In fact, the analysis did not rely strictly on C statistics, but also focused on calibration, particularly regarding the accuracy of identifying patients at the high end of each of the outcome distributions as displayed in Table 3 and Figure 2 (on following page).

Sincerely,

Lindsey R. Haas, MPH

Paul Y. Takahashi, MD

Nilay D. Shah, PhD

Robert J. Stroebel, MD

Matthew E. Bernard, MD

Dawn M. Finnie, MPA

James M. Naessens, ScDAddress correspondence to: Dwight Barry, PhD, Group Health Cooperative, 320 Westlake Ave N, Ste 100, Seattle, WA 98109. E-mail: barry.d@ ghc.org.

----

Source of Funding: None.

Author Disclosures: The authors (LRS, PYT, NDS, RJS, MEB, DMF, JMN) report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of the article.

Authorship Information: Concept and design (LRS, PYT, NDS, RJS, MEB, DMF, JMN); acquisition of data (LRS, JMN); analysis and interpretation of data (LRS, NDS, RJS, MEB, DMF, JMN); drafting of the manuscript (LRS, PYT, MEB, JMN); critical revision of the manuscript for important intellectual content (LRS, PYT, NDS, RJS, MEB, JMN); statistical analysis (LRS, JMN); obtaining funding (JMN); and administrative, technical, or logistic support (DMF).

Address correspondence to: Lindsey R. Haas, MPH, Health Care Policy and Research, Mayo Clinic, 200 First St Southwest, Rochester, MN 55905. E-mail: haas.lindsey@mayo.edu. 1. Haas LR, Takahashi PY, Shah ND, et al. Risk-stratification methods for identifying patients for care coordination. Am J Manag Care. 2013;19(9):725-732.

2. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115(7):928-935.

3. Hand DJ. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn. 2009;77:103-123.

4. Trouble at the lab. The Economist. 2013;409(8858):26-30.