Simple Errors in Interpretation and Publication Can Be Costly
Dwight Barry, PhD; Lindsey R. Hass; MPH, Paul Y. Takahashi, MD; Nilay D. Shah, PhD; Robert J. Stroebel, MD; Matthew E. Bernard, MD; Dawn M. Finnie, MPA; and James M. Naessens, ScD
TO THE EDITORS:
The study by Haas et al1 that compares different risk stratification methods is enlightening, although the authors have overstated their results and a discrepancy between the abstract and their results both contribute to the possibility that their study could be misinterpreted in costly ways.
It is good practice to include CIs on all point estimates, and CIs are at least available for the main results presented in Table 2 (on following page). But the authors fail to interpret those values, and as a result conclude—incorrectly— that the Adjusted Clinical Group (ACG) model is the best. Given the information available in the paper, it is correct to say that the results are consistent with there being no difference between ACG and Minnesota Tiering (MN). In fact, the paper’s figures (which lack CIs, unfortunately) and C statistic values suggest that any differences in outcomes for all models might be statistically distinct at times but are practically trivial. It is also questionable whether the C statistic is even appropriate for comparing models with different drivers, and whether it accurately portrays the receiver operating characteristic curve upon which it rests (it can’t).2,3
The specific error in the abstract states that the C statistic for the ACG model for predicting the highest 10% of cost users is 0.81, but Table 2 shows that the correct value is 0.76 (95% CI, 0.75-0.76). Since this overlaps with the results for the MN model (0.74; 95% CI, 0.74-0.75), the ACG model is not clearly “superior to the others.” If the authors meant to refer to readmissions, again the overlap in CIs between ACG and MN show that they are essentially equivalent models in this outcome as well. The term “superior” is a value-laden word that can be easily overinterpreted without a close read of the results; its use here presents both clerical and conceptual errors.
It may well be that the ACG model is superior to these other models, but this paper did not demonstrate that—in fact, this study’s results actually support the idea that all of these models are practically equivalent. Academics might call that conclusion a “negative result” and decide not to publish because of the excess difficulty of getting it through review. But as an industry professional, I would consider “negative results” like these—when properly interpreted— “fiscally important results.”4
AJMC is read by thousands of on-the-ground healthcare industry workers, and time-constrained administrators often take abstracts at face value, having no time to evaluate the results themselves. In fact, this paper made its rounds among our administrators when it was published, and may (or may not) have contributed to our subsequent decision to purchase one of these models. Purchasing risk models is expensive, and internal switching, implementation, and training costs multiply that well beyond the cost of the model. I would ask that editors and reviewers make absolutely sure that authors’ conclusions are warranted by their results before publication—improper interpretation can be as damaging and/or costly as clerical errors when subjected to the decision-making hurricane that is today’s healthcare industry.
Dwight Barry, PhD
The letter “Simple Errors in Interpretation and Publication Can Be Costly,” a response to the article “Risk-Stratification Methods for Identifying Patients for Care Coordination,” is primarily based on 1 line in the abstract. The author of the letter did identify a typographical error in the abstract which included the C statistic for readmissions instead of that for high-cost users (we have submitted an erratam to correct the error). Although we agree that the word “superior” could have been softened, the ACG did perform better than other models in predicting healthcare utilization. For example, as can be seen in Table 2 (below) of the original paper, for predicting the top 10% high-cost users, the ACG had non-overlapping CIs with all models except for MN Tiering, which is based on the ACG.
We believe the abstract was clear. Unfortunately, as abstracts are limited with regards to space, we would hope the reader would draw their impressions from the entire paper.â€‹ The conclusions of the paper stated that although the ACG was generally better at predicting utilization, all models had good concordance, suggesting that choosing any model would be more beneficial than none. All models, excluding ACG and MN Tiering, are free and publically available.
We agree with the correspondent that receiver operating characteristic curves should not be the only basis of comparison between models. In fact, the analysis did not rely strictly on C statistics, but also focused on calibration, particularly regarding the accuracy of identifying patients at the high end of each of the outcome distributions as displayed in Table 3 and Figure 2 (on following page).
Lindsey R. Haas, MPH
Paul Y. Takahashi, MD
Nilay D. Shah, PhD
Robert J. Stroebel, MD
Matthew E. Bernard, MD
Dawn M. Finnie, MPA
James M. Naessens, ScD