The American Journal of Managed Care June 2008
The Value of Ambulatory Care Measures: A Review of Clinical and Financial Impact from an Employer/Payer Perspective
To understand the value for payers and purchasers of primary care quality measures in an insured population, we conducted a 2-part analysis. In the first part, we reviewed the economic and clinical literature supporting 62 quality metrics spanning primary care that had been proposed for use in a physician recertification program and in a pay-for-performance program. We then ranked these metrics by both economic and clinical evidence of effectiveness. For many of the metrics, there was little clinical or economic support for inclusion in a pay-for-performance program. For the 20 with both clinical and economic evidence of effectiveness, we constructed actuarial models to understand the potential financial effect that attainment of these metrics would have in an insured population, from the perspective of a payer. Of those, 16 were found to be cost-saving in the short term with respect to direct medical costs incurred by payers. This analysis suggests that many recommended primary care quality measures may have little clinical evidence of effectiveness beyond expert opinion, and may provide scant clinical or economic benefit to payers if achieved. A minority, however, may deliver substantial savings in the short term. Given the current emphasis on pay-for-performance and pay-for-reporting programs, and recent studies showing a lack of relationship between measures and clinical/ economic value, this analysis informs payers, purchasers, providers, and policymakers about the importance of choosing the right metrics and the methods for collecting them.
(Am J Manag Care. 2008;14(6):360-368)
Our research analyzed the clinical and financial value of 60 commonly used and generally approved physician quality measures from a payer-purchaser perspective.
Only a handful of those measures had a significant clinical and financial impact.
However, those measures are not routinely found in claims data, thus putting to question the amount of resources that should be devoted to large claims data aggregation efforts as opposed to other data collection efforts.In 2007, for the first time in its history, the Medicare program tied a portion of a scheduled increase in physician fees to performance on a standard set of ambulatory care measures. This change in reimbursement strategy was prompted by (1) a recognition that measuring the value of Medicare physician spending has been, and continues to be, elusive; (2) a strong private sector movement to tie a portion of physician payment to demonstrated performance in delivering quality care; and (3) an acknowledgment that consumers deserve transparent information on the competence of physicians to meet certain quality thresholds.
As the Centers for Medicare & Medicaid Services (CMS) collects and disseminates these performance data, and as more than 100 similar efforts germinate in the private sector,1 there is a paucity of robust studies on the relationship between the achievement of ambulatory care measures and healthcare cost and quality. Prior research has shown a link between performance measures and costs and quality of care.2-4 In other related articles, physicians who received recognition by the National Committee for Quality Assurance (NCQA) for demonstrating good outcomes in the management of patients with diabetes were shown to have lower costs.5-8 These studies are consistent with other studies that demonstrate similar results.9-11 Their common denominator is the observation that a true measure of output is needed to compare the values created (or not created) by the care delivery process.
Output measures are best defined as those that most closely relate to the outcome of a patient’s care, or that have the highest correlation with that outcome. For example, an important outcome for a patient with diabetes is to avoid complications such as amputation, myocardial infarction, and renal failure. The measures that are most closely related to the avoidance of these events are the proper management of the patient’s glycosylated hemoglobin (A1C), low-density lipoprotein cholesterol (LDL-C), and blood pressure. Similarly, recent studies on the management of patients with cardiac disease demonstrate the importance of monitoring and measuring blood pressure.12
In a 2-part study, we reviewed 62 ambulatory care measures proposed for a specialty organization’s recertification program and for a pay-forperformance initiative. These measures were selected by an expert panel, and 50 of them were endorsed by the National Quality Forum (NQF), the Ambulatory care Quality Alliance (AQA),and/or the NCQA. The measures span primary care, including coronary artery disease (CAD), heart failure (HF), diabetes mellitus, osteoarthritis, asthma, major depression, hypertension, and acute-care conditions. eAppendix Table A lists the metrics and their endorsement status (available at www.ajmc.com). The first part of the study consisted of ranking each measure according to an index that combined clinical and economic value, and the second part consisted of conducting detailed actuarial analyses of the subset of measures that had the highest index score.
Our findings imply that many payers, including CMS, should carefully consider what measures to focus on.
To understand the benefit of each measure, we conducted a clinical and economic literature review, emphasizing meta-analyses demonstrating support for the measures. Given the preponderance of meta-analyses in our review, we captured a very large number of peer-reviewed articles. eAppendix Table B presents a review of the articles (available at www.ajmc.com). After we assembled the evidence for the measures, we created a point-based ranking system for both the clinical and the economic value of each measure. In basing our ranking systems on well-known methods published in the literature, our intent was to use an approach for capturing clinical and economic value that had been independently validated and was completely transparent. However, it is possible that our clinical and economic ranking systems, although comprehensive, did not capture all the elements of clinical and economic value that might be contained in a quality measure.
For the clinical evidence rankings, we used a methodology adapted from that of the GRADE Working Group.13 The GRADE Working Group is an international collaboration that has critiqued the assortment of evaluation tools used to rate clinical guidelines and has generated a standardized evaluation process.14 Quality of evidence was scored on a 5-point scale based on the study design for the supporting evidence:
• Meta-analysis in support—5 points.
• Multiple randomized controlled trials in support—4 points.
• Single randomized controlled trial in support—3 points.
• Observational studies only in support—2 points.
• Expert opinion in support—1 point.
Scores were reduced if there were questions of study quality, consistency, bias, directness, or imprecise/sparse data as follows:
• Serious limitations in study quality (-1).
• Important inconsistencies (-1).
• High probability of reporting bias (-1).
• Major concern about directness (ie, how does the outcome studied in the evidence align with the measure’s outcome?) (-1).
• Imprecise or sparse data (-1).
Conversely, scores were increased if there was evidence of strong association or dose response according to the following schema:
• Significant evidence of a strong association between measure and outcome (relative risk or odds ratio of >2 for morbidity or mortality outcome) (+1).
• Very significant evidence of a strong association between measure and outcome (relative risk or odds ratio of >5 for morbidity or mortality outcome) (+2).
• Evidence of a dose response gradient (+1).
As a result of this scoring, the maximum number of points awarded to any measure for clinical effectiveness in our analysis was 6.
For the economic ranking system, we adapted the method of Chiou et al.15 Points were first allocated on the basis of strength of evidence with:
• More than 1 study showing evidence of cost savings— 4 points.
• More than 1 study showing evidence of cost-effectiveness or cost utility at <$50,000 per life-year saved; or 1 study showing cost savings in some scenarios—3 points.
• One study showing evidence of cost-effectiveness or cost utility at <$50,000 per life-year saved—2 points.
• No published cost studies—1 point.
Scores were increased or decreased based on the following questions applied to the highest-scoring individual evidence:
• Was uncertainty handled by (1) statistical analysis to address random events and (2) sensitivity analysis to cover a range of assumptions? Yes +.5. No -.5.
• Were the perspective of the analysis (eg, societal, third-party payer) and reasons for its selection evident? Yes +.5. No -.5.
• Was the measurement of costs appropriate and the methodology for the estimation of quantities and unit costs clearly described? Yes +.5. No -.5.
As a result of this scoring scheme, the maximum number of points allocated for financial effectiveness was 5.5; therefore, the maximum number of points for the total combined score was 33, which represents the product of the clinical and economic scores (Table 1). The primary reason for using a product-based combined score was to numerically highlight the metrics that have been the subject of rigorous studies of both clinical and economic effectiveness. Moreover, a combined ranking based on the product of the separate clinical and economic scores provides a more balanced index, and avoids assigning undue weight, for example, to measures with strong clinical effectiveness scores but weak economic value, or vice-versa. Figure 1 illustrates the Pareto-like distribution of metrics by total combined points, where 19 metrics received 20 or more points and the remainder of the metrics received an average of 3 points or less.
We then performed a cost–benefit calculation using the measures with the highest combined rankings, because the metrics with low scores had little or no evidence of economic and clinical effectiveness. The actuarial models assessed the value of reductions in adverse outcomes when high-scoring metrics were achieved. To generate each model, we calculated the per capita benefits of treatment by determining the number, type, and average cost of morbidity events prevented by attainment of each metric, as determined from the literature and validated through the Thomson Medstat MarketScan database (Thomson Medstat Inc, Ann Arbor, Michigan), a large integrated claims database of commercially insured employees of mainly large corporations. See Figure 2 and Figure 3 for specific examples. When cost figures were outdated, we inflated them to 2006 levels using the medical Consumer Price Index. We assumed study populations were 50% male and 50% female, and where ethnicity was relevant (for the cholesterol and hypertension models), we assumed the population was 90% white and 10% black.
We next calculated per capita costs, using average costs for generic versions of pharmacotherapy treatments (where available) obtained from an online Internet pharmacy and including other related medical costs from an amalgam of likely therapies. The cost of medication side effects in these particular applications was generally not considered, with 2 exceptions: aspirin use and switching to angiotensin receptor blockers (ARBs) because of intolerance to angiotensin-converting enzyme (ACE) inhibitors. Although the incremental cost of medication used for treatment was included in the model, the incremental cost of physician time to prescribe these treatments was not considered. None of the interventions listed here would lead to codable procedures, although it is conceivable that they could increase the acuity of individual visits. (For example, a level 2 or 3 visit might be justifiably “upcoded” to a level 3 or 4.)
After we derived the per capita benefits and costs of treatment, we summed them to yield the net financial effect—savings or cost—of the specific quality measure. The actuarial models were conservative, considering only the direct medical cost of morbidity to employers/payers for patients less than 65 years of age, in a 1-year time frame. Based on the literature, we varied the specific morbidity effects for each measure. In the case of hypertension, for example, the literature documents reductions in end-stage renal disease (ESRD), CAD, and stroke. In the case of ACE inhibitor/ARB treatment for left ventricular systolic dysfunction (LVSD), there was a reduction in hospitalizations for congestive HF. There were additional morbidity effects that could be expected with each metric, such as the decrease in retinopathy with blood pressure reduction.24