Publication|Articles|June 3, 2025

June 2025
Volume 31
Issue 6
Pages: 288-294

Factors That Increase Utilization Management Risk: A Proof of Concept

Author(s)Jason Shafrin, PhD, Jacob Fajnor, BA, Shurui Zhang, MS

Key Takeaways

A score was developed to measure patient risk from payer utilization management policies and its relationship to real-world US commercial payer utilization management policies.

ABSTRACT

Objectives: To (1) develop a metric that quantitatively measures the risk that utilization management (UM) policies pose to patients and (2) measure the relationship between this metric and payers’ real-world UM use.

Study Design: We conducted a targeted literature review and an expert elicitation exercise to create the Data-Based Utilization Management Risk Designation (BURDEN) score. Real-world data analysis measured the relationship between BURDEN and actual payer policies.

Methods: The BURDEN score was based on 9 UM factors impacting patient outcomes. Factors were weighted based on an expert elicitation procedure with 6 stakeholders. UM policy restrictions were drawn from Tufts Medical Center’s Specialty Drug Evidence and Coverage database, and net price data came from SSR Health. Ordinary least square regressions were performed to examine the relationship between the BURDEN score and coverage policies.

Results: Among 98 treatments identified across 30 unique diseases, UM policies on treatments for myasthenia gravis, multiple myeloma, and lupus nephritis posed the highest risk to patients, according to the BURDEN score. When treatments had a high BURDEN score, payers were 22.0% less likely to impose any UM restriction (P = .041) and 36.2% less likely to impose step edits (P = .039).

Conclusions: This study developed a quantitative measure (BURDEN) to estimate the relative risk to patients of payer UM policies. Payers appeared modestly sensitive to treatments that posed a high risk to patients should UM be implemented. However, coverage decisions did not appear to be fully patient-centered, as some higher BURDEN products experienced increased UM usage.

Am J Manag Care. 2025;31(6):288-294. https://doi.org/10.37765/ajmc.2025.89748

_____

Takeaway Points

Cost pressures surrounding new pharmaceuticals and therapeutics have led payers to impose a variety of utilization management (UM) restrictions, but there is a gap in the ability to quantitatively measure UM’s impact on patients.

The Data-Based Utilization Management Risk Designation (BURDEN) score was developed to quantify the potential risk of negative outcomes to patients from payer UM policies (eg, step edits).
The BURDEN score had statistically significant negative relationships with the percentage of commercial insurance plans imposing any UM restrictions or step therapy protocols.
However, in certain instances, UM was implemented in therapeutic areas despite posing a high risk of negative outcomes to patients.

_____

In recent years, payers have increased use of utilization management (UM) strategies for specialty pharmaceuticals and therapeutics and expanded UM to coverage decisions for rare and severe diseases such as cancer and autoimmune disorders.¹ For example, the percentage of Medicare Part D specialty branded oncology treatments with UM restrictions increased from 73% in 2010 to 95% in 2020.² Specifically, payers have increased their use of prior authorization and step therapy (ST) protocols. For example, the use of prior authorization and ST for 4 treatments, including antidepressants and autoimmune disease immunotherapies, increased from 35% in 2011 to 67% in 2016.³ A 2021 review of coverage policies across 17 large commercial payers found that ST protocols were applied to 38.9% of drug coverage policies, 55.6% of which were more stringent than clinical guidelines.⁴

With the rapid growth of UM policies, employers in the US have faced the challenge of understanding how UM impacts both plan costs and employees’ health. The standards for the UM policies are established at the individual drug level to ensure patients receive appropriate medical treatments while effectively managing costs.⁵ However, it is not feasible for employers to review individual UM policies for each drug when deciding which health plans to offer their employees. Although there are publicly available plan rating systems, such as Medicare Part D Star Ratings and the Healthcare Effectiveness Data and Information Set,⁶ none of these measures consider UM policies directly. On the one hand, in a survey of large US employers by CancerCare, more than 95% agreed that direct costs of plans and employee health are influential factors in their decision-making.^7,8 On the other hand, employers view quality health coverage as a key tool for recruitment and retention and have a vested interest in ensuring employees get the coverage they need.⁹ This makes balancing cost savings and plan access a challenge, as 93% of surveyed physicians indicated that UM policies led to delays in treatment and increased severe adverse event rates.¹⁰

Because employers have few tools to evaluate the impact of UM policies on member health and costs when making decisions, this study aimed to make 2 contributions to the literature. First, we developed a novel, quantitative index for measuring the potential patient health risk due to UM and how this risk varies across diseases and treatments. The proposed index is a proof of concept and serves as only a high-level tool to help employers compare relative UM stringency across drug coverage options; it does not aim to replace managed care pharmacist decision-making processes for UM policies for specific pharmaceuticals. Second, we examined the correlations between disease- and drug-level factors, which may have a negative impact on patient outcomes, and real-world payer use of UM policies.

METHODS

BURDEN Score Construction

To quantify how UM implementation may impact member coverage decisions, we developed the Data-Based Utilization Management Risk Designation (BURDEN) score using a 4-step process. First, a targeted literature review was conducted to identify the factors that make UM strategies problematic if they delay or deny needed treatments. Among 9 factors identified, 3 factors were disease-specific: severity, reversibility of progression without intervention, and the number of FDA-approved alternative treatments. The remaining 6 factors were treatment-specific attributes: the potential for addiction or abuse, short-term symptom relief, the rate of serious (grade 3 or higher) adverse events, route of administration complexity, dosing schedule, and the level of clinical evidence for treatment efficacy.

Second, parameters for each factor were primarily collected from 30 Institute for Clinical and Economic Review (ICER) assessments published between 2019 and 2021.¹¹ ICER assessments were selected as the primary data source because they follow a standardized format, ensuring the consistency of parameter values across diseases and treatments. Additional details on how factors were collected from ICER assessments are available in the eAppendix (available at ajmc.com).

Third, because certain factors may impact member coverage decisions differently, factor weights were elicited from 6 targeted interviews with key opinion leaders who had diverse professional backgrounds, in accordance with ISPOR best practices for multicriteria decision analyses.¹² The weights were constructed according to the Simple Multiattribute Rating Technique (SMART), which offers transparency in weight constructions.^13,14 Additional details on how experts ranked factors and how weights were generated with SMART are available in the eAppendix.

Fourth, the BURDEN score was defined as the sum of the normalized and deviation-adjusted factors multiplied by the factor weights developed during the targeted interviews (see eAppendix for more details on BURDEN score derivation). The minimum possible BURDEN score is 0.00, which measures 0 deviations from the least problematic factor values. Conversely, a larger BURDEN score implies that UM implementation may have a greater impact on member coverage decisions. The BURDEN score was constructed under the assumption that providers have access to more complete information at the patient level than payers when selecting treatments.

Data Sources

To measure the real-world use of UM in commercial coverage decisions, this study relied on the Tufts Medical Center’s Specialty Drug Evidence and Coverage (SPEC) database. SPEC contains data from 17 of the 20 largest US commercial payers on plan coverage for specialty drugs and products, representing approximately 150 million lives.¹⁵ SPEC specifies whether plans’ pharmaceutical coverage policies contain UM restrictions. In particular, this study considered whether a payer’s coverage decision for pharmaceuticals included any UM restriction or ST protocols. The data in our analysis were captured in October 2023 and included 37 treatment-indication pairs and 17 coverage policies.

Because a payer’s acquisition cost may influence a pharmaceutical’s placement in a formulary and subsequent use of UM policies, this study included estimated net price data from the 2019-2022 SSR Health US brand pricing data set by indication. SSR Health estimates the treatment-level net prices of approximately 1000 pharmaceutical products using list pricing and unit volume.¹⁶ Net price was measured as the percentage difference between the listed gross price and average net price across payers.

Statistical Analysis

First, a correlation matrix of identified factors, net price, the BURDEN score, and UM implementation was constructed. Payer use of UM was represented by (1) the percentage of commercial plans implementing any form of UM and (2) the percentage of commercial plans implementing an ST protocol. The correlation matrix allows one to examine the relationship across the specific factors that make up the BURDEN score, as well as how each factor relates to real-world UM implementation across diseases and therapeutics.

Second, this study implemented ordinary least squares regression models to capture the extent to which UM implementation in coverage decisions was sensitive to the BURDEN score in 2022. To allow for a nonlinear relationship between the BURDEN score and each measure of UM, the BURDEN score was modeled as a categorical variable that divided data into 3 groups, indicating the tercile range of scores.

Sensitivity Analyses

This study contained 2 sets of sensitivity analyses. First, a series of sensitivity analyses was conducted to ensure that the BURDEN score provided a robust and unbiased measurement of how problematic UM implementation would be for insurance plan members. Additional details on each of these analyses are available in the eAppendix. Second, to determine whether any factors used in the construction of the BURDEN score were highly influential to payer use of UM, we fit ordinary least squares regressions between measures of UM implementation and the BURDEN score factors.

RESULTS

Factor, Disease, and Treatment Selection

Among the 2019-2021 ICER reviews considered, 98 treatments were identified among 30 unique diseases covering 13 therapeutic classes (eAppendix Figure 1, eAppendix Table 1, and eAppendix Table 2). More than half (52.0%) of the treatments were indicated for neurology, rare disease, or gastrointestinal therapeutic areas (eAppendix Table 2). Among these 98 treatments, 89 (90.8%) were pharmaceuticals, and 9 (9.2%) were nonpharmacologic interventions. Three of the nonpharmacologic interventions were digital health technologies for combating opioid addiction, and 6 were individual supervised injection facilities in major metropolitan areas in the US (eAppendix Figure 1 and eAppendix Table 1). The mean (SD) patient quality of life (QOL) across diseases was 0.626 (0.24), 74.5% (n = 73) of treatments were indicated for diseases that are never or rarely reversible without surgical or pharmacologic intervention, and patients could expect 23.05 life-years from diagnosis, although this result is skewed by certain pediatric diseases. Among identified treatments, the mean serious adverse event rate was 11.7%, and 3.1% (n = 3) treatments had no FDA-approved therapeutic alternates (eAppendix Table 3).

The BURDEN Score

The disease-level BURDEN scores ranged from 1.36 (acute migraines) to 2.87 (myasthenia gravis), and the mean (SD) BURDEN score was 2.09 (0.48). Myasthenia gravis (2.87), multiple myeloma (2.85), and lupus nephritis (2.72) were the 3 diseases included in this study for which UM would be the most problematic for plan members. Acute migraines (1.36), depression (1.44), and high cholesterol (1.46) were the conditions included in this study for which UM would be the least problematic for plan members (Figure 1). A review of treatment-level scores is available in the eAppendix.

Relationships Among UM Factors, BURDEN Score, and Payer Use of UM

Among the 98 treatments identified in ICER assessments, 37 were covered by commercial plans represented in SPEC data, and of these 37, 29 were also included in SSR Health net price data. The 29 pharmaceuticals included for analysis had a mean (SD) BURDEN score of 2.19 (0.39), with scores ranging from 1.44 to 2.93. Each treatment had at least 1 commercial plan that imposed some type of UM restriction in its coverage decision. On average, 77% of commercial plans imposed some form of UM for a given treatment observed in the data; 54% of commercial plans applied ST protocols to manage utilization (Table 1).

Among continuous factors, payer use of any form of UM was most highly correlated to a product’s rate of severe adverse events, the number of alternative treatments available, and expected survival under the standard of care. Treatments with higher rates of severe adverse events were less likely to experience any UM (ρ_UM= –0.371), suggesting that UM implementation may not—at least primarily—be driven by patient safety concerns. Disease areas with more treatment alternatives were correlated with more UM use (ρ_UM= 0.257). Although this finding may appear counterintuitive, payers may rely on price negotiation (ie, higher rebates) rather than UM when there are multiple treatment alternatives. UM implementation was positively correlated with expected survival (ρ_UM= 0.244), suggesting that payers were less likely to restrict treatment access when survival outcomes were poor. Similarly, payer use of ST was most correlated with the number of alternative treatments (ρ_ST = 0.468) and rate of severe adverse events (ρ_ST = –0.304). Expected annual number of doses was also relatively highly correlated with ST use (ρ_ST = –0.255), but this result may be driven by increased use of ST for single-administration cell and gene therapies (eg, chimeric antigen receptor T-cell therapy) (Figure 2).

Among categorical factors, payer use of any form of UM was most correlated to a treatment’s ability to alleviate symptoms within 1 month of initiation and a treatment’s potential for addiction or abuse. Payers were less likely to restrict treatment access when patients experienced symptom relief and subsequent quality of life improvements (ρ_UM = –0.361; ρ_ST = –0.294). When treatments had the potential for addiction or abuse, encoded as zero, payers were more likely to use UM to direct patients toward treatments that posed a lower risk of dependency after the course of treatment was completed (ρ_UM = –0.261;ρ_ST = –0.136). Also perhaps counterintuitively, payers were less likely to implement ST when treatments had complex routes of administration (ρ_ST = –0.239) (Figure 3).

Moving from the individual metrics to the overall index, increases in the BURDEN score were correlated with decreases in UM implementation (ρ_UM = –0.358) and decreases in the use of ST protocols (ρ_ST = –0.372), which implied that—on average—payer coverage decisions were sensitive to plan member outcomes when UM may prove problematic, although imperfectly. Furthermore, treatments with higher net price discounts were correlated with the implementation of both UM use and ST protocols (ρ_UM = 0.141;ρ_ST = 0.233) because payers used multiple strategies to reduce total plan expenditures by requiring patients to use less expensive alternative therapies (Figure 2).

Using the BURDEN Score to Measure Changes in UM Implementation

As the BURDEN score increased, commercial payers reduced their use of UM policies relative to treatments in the lowest tercile. Treatments for which UM was most problematic for plan members (defined as the third tercile of BURDEN scores vs the first tercile) were correlated with a 22.7–percentage point reduction in plans imposing any UM restriction (BURDEN coefficient: –0.227; P = .027). When controlling for net price, treatments with the highest BURDEN score had a 22.0–percentage point reduction in UM restrictions relative to those with low BURDEN score (P = .041). Payer use of ST was more sensitive to changes in the BURDEN score. Compared with treatments with a low BURDEN score, those with a high BURDEN score had a 39.6–percentage point reduction in the use of ST (P = .019); treatments with a moderate BURDEN score (second tercile) had a 28.2–percentage point reduction in the use of ST vs those with the lowest BURDEN score (P = .086). When controlling for net price in the regression, treatments with a high BURDEN score had a 36.2–percentage point reduction in the rate of ST protocols vs treatments with a low BURDEN score (P = .039) (Table 2).

For certain treatments, however, payer coverage decisions were less sensitive to the BURDEN score. In coverage decisions for myasthenia gravis, which had the highest BURDEN score, 100% of plans imposed some form of UM for efgartigimod and eculizumab. Similarly, in coverage of lupus nephritis, which had the third-highest BURDEN score, 91.7% and 82.4% of plans imposed UM for voclosporin and belimumab, respectively (eAppendix Figure 2). When considering the percentage of plans imposing ST, efgartigimod, eculizumab, and belimumab were subject to ST protocols at a higher rate than predicted by the BURDEN score (eAppendix Figure 3). The sensitivity analysis results can be found in the eAppendix.

DISCUSSION

This study makes 2 contributions to the benefit management literature. First, we examined the relationship between 9 factors that make UM problematic for plan members and real-world use of UM strategies. Second, we created a new, quantitative index that allows employers to better understand the potential impact of UM on benefit design selection. Across treatments and diseases, UM implementation was more problematic for plan members when QOL was poor, expected life-years were low, and few alternative treatments were available. For instance, myasthenia gravis, multiple myeloma, lupus nephritis, and Duchenne muscular dystrophy were the diseases for which treatment delays and denials due to UM may be most impactful to plan members. In this analysis of 29 products reviewed by ICER and including both the SPEC and SSR Health data, increased rates of UM implementation were correlated with lower QOL under the standard of care, lower rates of severe adverse events, lower potential for addiction or abuse, and higher net prices. Similarly, ST protocols were more likely to be applied when patient QOL was low, treatments had low rates of severe adverse events, or many alternative products were available. Although higher BURDEN scores were correlated with reductions in UM, some treatments still saw frequent UM implementation despite the potential consequences for plan members.

For employers, UM pharmaceutical cost savings are easy to calculate, but quantifying the impact UM has on health plan members is far more challenging. Ideally, employers could inform benefit design decisions based on both plan costs and a separate quantitative index, such as Medicare Star Ratings, to balance higher costs and potentially problematic UM usage. The results of this study, including factor correlations with UM implementation, factor weights, or the BURDEN score itself, can be used by employers to make more informed choices in their benefit design. This index would not replace managed care pharmacists’ important role in making UM decisions for individual treatments, but rather it would help employers pick plans that balance UM impacts and overall costs. A table of key parameters for applying the BURDEN score algorithm to diseases and treatments outside the current data set is contained in eAppendix Table 4.

Limitations

This study has several limitations. First, this study relied on 30 diseases and 98 treatments reviewed by ICER between 2019 and 2021 within the context of the US health care system, which may bias treatment selection toward experimental therapies for severe diseases with high acquisition costs. Second, although the SMART methodology is a robust way to measure the importance of individual criteria, the factor weights relied on by this study could be measured more comprehensively through methods such as a Delphi panel with a larger number of respondents.¹⁷ Third, the factors deemed to make UM riskier for patients were derived from the literature, and a separate approach, such as a Delphi panel, may be a useful tool to augment these results. Fourth, the definitions used for data extraction possessed certain limitations. Patient life expectancy did not control for the mean age at diagnosis, and as a result, some severe pediatric conditions presented relatively long life-years expected after treatment initiation, whereas less severe conditions with later onset, such as type 2 diabetes, exhibited lower survival after treatment initiation. Fifth, there are limited data linking individual patient outcomes, within indication, to UM policies. Finally, SPEC data rely on publicly available coverage information, and thus the data do not reflect any customizations to coverage policies made by group sponsors.

CONCLUSIONS

Although payers appeared modestly sensitive to applying UM when it posed a high impact on patient outcomes, there remained certain instances in which ST was implemented at a high rate for severe diseases with few treatment alternatives. Future research is needed to further understand the rationale or clinical justification for ST, along with the real-world outcomes for patients when UM is applied in high-risk therapeutic areas.

Author Affiliations: Center for Healthcare Economics and Policy, FTI Consulting, Los Angeles, CA (JS, JF), and Washington, DC (SZ); Genentech, Inc (DEN), South San Francisco, CA.

Source of Funding: This study was funded by Genentech.

Author Disclosures: Dr Shafrin is employed by FTI Consulting, which is a consulting firm to health care, life science, government, and nongovernmental entities, and he is a board member of the Schizophrenia & Psychosis Action Alliance. Mr Fajnor and Ms Zhang are employed by FTI Consulting. At the time this research was conducted, Dr Nichols was employed by Genentech, a pharmaceutical company whose products are subject to utilization management. FTI Consulting received payment for involvement in the preparation of this manuscript from Genentech.

Authorship Information: Concept and design (JS, JF, SZ, DEN); acquisition of data (JS, JF, SZ); analysis and interpretation of data (JS, JF, SZ, DEN); drafting of the manuscript (JS, JF, SZ, DEN); critical revision of the manuscript for important intellectual content (JS, JF, SZ); statistical analysis (JS, JF, SZ); administrative, technical, or logistic support (JF, SZ); and supervision (JS, DEN).

Address Correspondence to: Jason Shafrin, PhD, Center for Healthcare Economics and Policy, FTI Consulting, 350 S Grand Ave, Ste 3000, Los Angeles, CA 90071. Email: jason.shafrin@fticonsulting.com.

REFERENCES

1. Meyer T, Yip R, Santiesteban D. Utilization management trends in the commercial market, 2014–2020. Avalere. November 24, 2021. Accessed January 26, 2023. https://avalere.com/insights/utilization-management-trends-in-the-commercial-market-2014-2020

2. Kyle MA, Dusetzina SB, Keating NL. Utilization management trends in Medicare Part D oncology drugs, 2010-2020. JAMA. 2023;330(3):278-280. doi:10.1001/jama.2023.10753

3. Resneck JS Jr. Refocusing medication prior authorization on its intended purpose. JAMA. 2020;323(8):703-704. doi:10.1001/jama.2019.21428

4. Lenahan KL, Nichols DE, Gertler RM, Chambers JD. Variation in use and content of prescription drug step therapy protocols, within and across health plans. Health Aff (Millwood). 2021;40(11):1749-1757. doi:10.1377/hlthaff.2021.00822

5. Giardino AP, Wadhwa R. Utilization management. In: StatPearls. StatPearls Publishing; 2023. Accessed October 8, 2024. https://www.ncbi.nlm.nih.gov/books/NBK560806/

6. 2024 Medicare Advantage and Part D star ratings. CMS. October 13, 2023. Accessed October 8, 2024. https://www.cms.gov/newsroom/fact-sheets/2024-medicare-advantage-and-part-d-star-ratings

7. Gavidia M. Employer utilization management prioritizes health benefit cost over patient care, survey finds. AJMC. April 13, 2022. Accessed October 8, 2024. https://www.ajmc.com/view/employer-utilization-management-prioritizes-health-benefit-cost-over-patient-care-survey-finds

8. Employer utilization management tactics place healthcare benefits cost before patient care, survey finds. News release. CancerCare. April 13, 2022. Accessed October 8, 2024. https://www.prnewswire.com/news-releases/employer-utilization-management-tactics-place-healthcare-benefits-cost-before-patient-care-survey-finds-301523811.html

9. Spiegel J, Fronstin P. What employers say about the future of employer-sponsored health insurance. The Commonwealth Fund. January 26, 2023. Accessed October 8, 2024. https://www.commonwealthfund.org/publications/issue-briefs/2023/jan/what-employers-say-future-employer-health-insurance

10. Robeznieks A. Why prior authorization is bad for patients and bad for business. American Medical Association. February 18, 2022. Accessed October 8, 2024. https://www.ama-assn.org/practice-management/prior-authorization/why-prior-authorization-bad-patients-and-bad-business

11. Who we are. Institute for Clinical and Economic Review. Accessed June 21, 2023. https://icer.org/who-we-are/

12. Marsh K, Ijzerman M, Thokala P, et al; ISPOR Task Force. Multiple criteria decision analysis for health care decision making—emerging good practices: report 2 of the ISPOR MCDA Emerging Good Practices Task Force. Value Health. 2016;19(2):125-137. doi:10.1016/j.jval.2015.12.016

13. Valiris G, Chytas P, Glykas M. Making decisions using the balanced scorecard and the simple multi-attribute rating technique. Perform Meas Metr. 2005;6(3):159-171. doi:10.1108/14678040510636720

14. Siregar D, Arisandi D, Usman A, Irwan D, Rahim R. Research of simple multi-attribute rating technique for decision support. J Phys Conf Ser. 2017;930:012015. doi:10.1088/1742-6596/930/1/012015

15. SPEC database. Tufts Medical Center. Accessed October 8, 2024. https://cevr.tuftsmedicalcenter.org/databases/spec-database

16. SSR Health. Accessed October 8, 2024. https://www.ssrhealth.com/

17. Avella JR. Delphi panels: research design, procedures, advantages, and challenges. Int J Dr Stud. 2016;11:305-321. doi:10.28945/3561