New Thinking on Clinical Utility: Hard Lessons for Molecular Diagnostics
John W. Peabody, MD, PhD, DTM&H, FACP; Riti Shimkhada, PhD; Kuo B. Tong, MS; and Matthew B. Zubiller, MBA
The once-lucrative diagnostics testing market is at a crossroads, facing greater pressure to show value in an atmosphere of evolving regulatory priorities. Cost and patient value are in the forefront of every payer’s mind as we move deeper into the era of healthcare reform and cost consciousness. Converging efforts—such as the recent McKesson Health Corporation and American Medical Association (AMA) partnership to create a registry of molecular diagnostic tests, and the new gap-filling procedures used by CMS to set reimbursement rates—have increased both the granularity and clarity demands of diagnostic coding, emphasizing the increasing need for precision in quantifying test value and justifying price. With these pressures come insight and a clear aspiration for companies and payers alike: Better, less risky, more principled strategies are necessary for companies to generate evidence of impact and determine clinical utility.
Clinical utility—defined as the usefulness of a test for clinical practice (distinct from clinical validity, which is how well the test can determine the presence, absence, or risk of a specific disease)—is arguably the most significant hurdle facing new technologies and their investors.1 Palmetto GBA, the CMS carrier for California, Nevada, Hawaii, and Pacific Islands (the region referred to as Jurisdiction E, previously called J1), was the first to require that companies complete a technology assessment summarizing all evidence of clinical validity and clinical utility when seeking CMS coverage and reimbursement. Under its MolDx program, created in 2011, not only did it bring to the forefront the importance of clinical utility evidence, but it also recognized the need for uniquely identifying these tests with a nomenclature known as Z-Codes. Issued by McKesson (contracted technology provider for MolDx), Z-Codes offer a transparent way to identify and track unique diagnostic tests. The relevant information about these tests is captured and shareable within McKesson’s Diagnostics Exchange (DEX), an online test registry and work flow solution for test manufacturers to submit information and evidence about their tests. Each test has an electronic dossier (ie, information set) that is curated and controlled by the submitting lab, which can also be shared at the discretion of that lab. Payers, such as Palmetto GBA, are poised to use the DEX to obtain all the information they need to make coverage and reimbursement decisions.
In 2014, McKesson entered into a licensing relationship with the AMA to group and index McKesson Z-Codes with corresponding molecular pathology codes in the AMA’s Current Procedural Terminology (CPT) code set to make it possible to identify which test was performed and to help simplify the reimbursement process. The Z-Codes, in conjunction with their specified CPT codes, now provide greater transparency, enabling payers to track outcomes on specific tests and to eventually analyze the clinical utility of these tests. This information can be centralized and accessed in the DEX.
CMS has also recently (January 2013) decided to implement a gap-filling methodology to determine the clinical lab fee schedule for molecular tests on Medicare claims.2 In the gap-filling process, laboratories submit cost information, such as the costs of resources required to perform a given test. CMS contractors, such as Palmetto GBA, examine and compare costs from different manufacturers and determine a new pricing structure. The new clarity around utilization and costs enhances assessment of clinical utility, allowing better definition and quantification of test utility and value.
The inability to demonstrate clinical utility is now the most cited reason for diagnostic tests failing to obtain coverage. We examined Palmetto GBA’s MolDx decisions (publicly available) for the period between January 2013 and July 2013 and found that 12 of 34 applications for CMS coverage (about 40%) were denied by MolDx due to lack of clinical utility data.
Clearly, failure to secure clinical utility is one of the greatest business risks that promising diagnostic technology companies face. Failure here forces some companies to shut down when they receive a negative coverage decision, including, for instance, Predictive Biosciences Inc, which received a negative coverage decision based upon lack of clinical utility data for their CertNDx bladder cancer diagnostics. Despite these dire consequences, diagnostic test companies still pour their valuable initial resources into validation studies to show clinical and analytical validity, to gain FDA approval for their test. FDA approval, however, does not include clinical utility review, which is a prerequisite for securing Medicare coverage.
This growing awareness of clinical utility has brought to the forefront the need to understand successful coverage and reimbursement (C&R) strategies. Drawing on the recent reimbursement decisions (January 2013-July 2013) from the MolDx program, and on our experience working with payers and test manufacturers, we examine how companies have managed (and failed) to secure C&R in a timely, cost-conscious manner, and we summarize 5 basic requirements—which we call lessons—for planning, implementing, and proving clinical utility to secure C&R.
Lesson 1. Understand that outcomes are hard to capture, but that clinical behavioral change is almost always proximate to outcomes change. Establishing clinical benefit draws immediate (and important) attention to improving patient outcomes. It is often impractical, however, to follow patients over an adequate period of time to see real changes in health status outcomes—time that typically overwhelms most investment strategies and the capital of small companies. Outcomes are even more of a challenge because so many distinct events must happen between pulling a validated test off the shelf and arriving at better patient outcomes. (There are some exceptions where test results might drive patient behavior change directly, but this is uncommon.)
Two major obstacles may prevent a valid test from demonstrating its clinical utility. One occurs at the patient level: some patients do not get better even when clinical care is done correctly. The other is at the provider level: Providers practice differently, and occasionally poorly. Both confound attempts to understand whether a diagnostic test was useful.3 Because both patients and providers exhibit variation that can impact outcomes, potentially derailing an otherwise clinically valid test, the best defense is to power a study with a large number of patients and follow them over a long period of time.
An even greater risk, however, is a study that is improperly designed and uses the test in a way that may not be useful for a general population (but perhaps would be helpful in subset of patients). Avoiding this pitfall involves identifying the precise population for which the diagnostic test is useful, which, though crucial, may unfortunately be difficult in the early stages of development. One example of a design fiasco is failing to distinguish between a screening test to diagnose those at risk, which is not covered by CMS, and a diagnostic test for at-risk patients or a diagnostic test for disease activity.
Once a population is identified, clinical utility studies need to be designed in such a way that information gained from the test leads to a different treatment, to fewer (other) tests, or even a decision not to treat (eg, distinguishing benign from malignant disease). A welldesigned clinical utility study, therefore, is crafted to demonstrate that the test adds information that changes the clinical treatment course and ultimately the outcomes (diagrammed in the Figure).
In this era of ever-increasing healthcare costs, even though value and cost efficiency are of primary concern, it is worthwhile to point out that cost-effectiveness is not an explicit requirement for FDA approval, which is often obtained prior to applying for Medicare coverage and reimbursement (Code of Federal Regulations, Title 21, parts 314 [pharmaceuticals] and 860 [medical devices]). Nonetheless, commercial (private) payers can be particularly compelled to provide coverage if a test can be shown to reduce costs along with improving clinical outcomes,4 and, accordingly, we anticipate cost-effectiveness data will become an increasingly important element in payer decisions. Thus, diagnostic test entrepreneurs and investors alike may well consider cost-effectiveness to be part of a comprehensive evaluation of utility.
Lesson 2. Start early. While utility requires an explicit look at clinical value, it is clear that many companies wait too long to begin determinations of clinical utility and real-world effectiveness. Test developers, who are expert in molecular and cellular technology and not medical economics, often do not know how to demonstrate utility and do not recognize that without utility there is no payment, thwarting the ultimate realization of bringing a significant new technology into the marketplace; again, examples abound of unfavorable coverage decisions that resulted from companies not determining clinical utility in a timely manner. Examples from Palmetto GBA’s MolDx are summarized in the Table and illustrate this point.
A striking finding is that some companies fail to determine utility and effectiveness by falling into a sequencing paradox: to wit, how can we know if our test is effective if we haven’t first proved it is valid? Yes, validation studies are clearly crucial, reporting on the strength of association between the diagnostic test and a specific disease state— but the belief that it is necessary to determine a test’s clinical validity (efficacy) before determining its clinical utility is not correct. Instead, clinical utility data should and can be gathered early to carefully determine the clinical parameters around which clinical utility will be established.
Clinical simulation, already used widely in clinical performance measurement,5 offers a simple and cost-efficient way to do an early clinical utility study, even before validity studies are complete; in fact, it can be used to determine the parameters of validity studies. In our own experience using Clinical Performance and Values (CPV) vignettes, we have captured clinical behavior change, and once such changes are established in silico, it is possible and plausible to assert that when the test is launched and used broadly, there will be commensurate improvements in outcomes.6 A randomized controlled study of a recently approved multi-biomarker diagnostic assay, for example, used simulations successfully. The company demonstrated that when its test indicated a change in disease activity (ie, the test was validated), rheumatologists who used this test made the correct assessments and treatment decisions for simulated rheumatoid arthritis cases. Importantly, the measured change was not an outcome or a patient health measure, but the clinical decision to treat (or not treat). One clear advantage of CPV simulations is that they have been validated against actual practice,7-9 they are straightforward (randomly assigned, impartial physicians are asked to make hypothetical clinical decisions based upon having or not having the test results), and they remove patient-level variation. Another advantage is that they can assert that the test has utility before the validation studies are completed or even begun, thus accelerating the C&R process.
Starting early with simulations means starting data generation for utility at a much lower cost than would be the case with a full clinical equipoise study. In the event the test does not change clinical practice (ie, a negative utility study), the company has the opportunity to revisit its technology and seek another aspect of clinical practice the test might be able to change beneficially. Correspondingly, if the test does indicate a change in clinical practice, the company is appropriately encouraged to use the experimental sample frame and examine the impact on patients of the providers who use the test, as there is an expected cascading impact on actual patient outcomes stemming from clinical practice change. This links the early adopters of diagnostic tests to hard patient outcomes.
Lesson 3. Learn from successes (and failures). While the parameters of exactly what defines clinical utility are broad and may appear amorphous, MolDX makes clear that at a minimum, clinical utility is composed of good science, patient impact, and practice change. MolDx guidelines suggest 2 well-designed controlled experiments published in peer-reviewed journals, a significant number of subjects to establish clinical significance (including Medicare population in the study group), and demonstrated changes in physician treatment behavior based on the assay results and/or improved patient outcomes. Companies too often appear to simply check appropriate boxes on a form, believing that their work is done when they can present 2 studies, some patient results, and their earnest assurance that practitioners value the test. What they must realize, however, is that they need to be involved in serial evaluation of clinical utility (see Table).
Lesson 4. Determine clinical utility with rigorous science. For too long, companies have relied on retrospective studies, anecdotes, testimonials, and non-randomized studies to try to demonstrate clinical utility. Clinical utility, like clinical validity, can and must be determined experimentally. The Center for Medical Technology Policy, which develops and publishes methodological standards, has established guidelines on the design of prospective studies on clinical utility.10 They argue, as we do, that clinical utility must be examined in a scientifically rigorous manner and must be considered early on, with an analysis plan in place. What is scientifically “rigorous enough” is a common question, especially since randomized controlled trials may mean involving multiple sites and outcomes.
For many companies, developing experimental studies on utility means, to start, (re)thinking how a business creates an ongoing clinical utility research plan. Such a plan focuses on how a test will change care, its most effective and selective clinical uses, and its potential economic benefits (or costs), and it ultimately demonstrates—through a series of studies—how clinical utility can be generalized to different populations. A company’s leadership choices—perhaps made previously with expertise in validity in mind—may have to be tweaked when expertise in utility, and openness to new ways of thinking, is desired as well.
With ongoing reimbursement changes expected, stricter and more narrowly specified demands for experimental evidence on clinical utility are likely. Non-randomized trials, for example, fail to meet the new evidence standards. The better strategy is to conduct smaller, well-designed randomized controlled trials to identify exactly the clinical outcomes that will build the evidence chain for clinical utility. In contrast, a large clinical equipoise study early on in test development is fraught with too much sample error and too many uncertainties related to power estimations and variations in practices and patients to be an efficient use of resources. The prudent initial smaller-study approach, however, requires close attention to sample size, variance, and effect size calculations.
Another important avenue to consider is Coverage with Evidence Determination (CED),11 under which CMS coverage is given to effective but unproven diagnostic
tests, contingent on providing evidence to support clinical utility and demonstrating that the principal purpose of the study is to test whether a particular intervention improves health outcomes. The experience of Iverson Genetics is a good example of how CED has been and can be used. Iverson’s panel test for genetic variants in the CYP2C9 and VKORC1 genes is used to determine the best dose of warfarin, an anticoagulant frequently used in the prevention of thrombosis and thromboembolism. Warfarin has a narrow therapeutic window, which means physicians often have to adjust its dose to avoid serious adverse events, such as excessive bleeding and blood clots in patients. However, despite strong evidence of an association between genetic variants and stable warfarin dose, Iverson’s initial clinical studies did not prove to CMS′ satisfaction that testing for the genetic variants CYP2C9 and VKORC1 actually improved health outcomes.12
Subsequently, Iverson and others have conducted additional randomized controlled clinical utility studies under a CED arrangement with CMS in an effort to demonstrate clinical utility. Results from these randomized controlled trials have been mixed,13,14 leading to further reassessment of the clinical utility of warfarin pharmacogenetics.15 The most recent draft policy from Palmetto GBA expresses this uncertainty explicitly for use in warfarin dosing, specifying that CYP2C9 genotype testing is still pending consideration under CED, but that VKORC1 genotype testing has been deemed to have insufficient clinical utility data and thus will not be covered.16
Lesson 5. Understand that clinical utility studies may need to involve payers and providers from the start. Reimbursement is made by payers, of course, who ultimately decide on coverage and reimbursement. National spending for molecular diagnostics and genetic testing totaled about $5 billion in 2010, which is about 8% of national spending on clinical laboratory services.17 Much of the reimbursement for diagnostic services is provided to nonelderly females, because of the wide variety of tests available for breast and ovarian cancers. Clearly, although the field of molecular diagnostics has grown rapidly, its use in clinical practice remains limited.18,19 Whether the slow rates of adoption beyond breast and ovarian cancer testing come from lack of awareness, lack of infrastructure (eg, some tumor markers require specific technologies), lack of knowledge on the part of physicians, limited availability, or limited effectiveness/utility in real-world settings, is difficult to determine.18 Payers, however, are sensitive to uptake and utilization; thus, they want to ensure that the diagnostic tests they do cover are indeed critical to clinical decision making. With the adoption of Z-Codes, for example, payers can be involved earlier in the process and perhaps enhance the collection of data by sharing the claims information relevant to the test and patient outcome.
Companies can consider engaging commercial payers by involving them in the design plan(s) for a clinical utility study. When a company tells a payer about studies that will assess the utility of a new diagnostic test, decision makers have the opportunity to comment and guide clinical utility validation. We at QURE Healthcare often do this by convening a panel of commercial payers and providers as we move from study protocol to ethics review. Panels contextualize the anticipated findings and offer opportunities to hear from other stakeholders. Most importantly, by starting early; by designing an experimental study with explicit, identified outcomes; and by committing ex ante to findings that may or may not be favorable, a company can expect results that are both anticipated and credible. By inviting an impartial outside party to moderate the panel, the likelihood of payer adoption is enhanced. MolDx, on the other hand, does not like the idea of early engagement. Its preference is to engage companies only after a full application/dossier has been submitted for their full review. Companies often err by submitting an incomplete dossier to MolDx, which leads to delays and frustration on both sides.
Coverage and reimbursement for diagnostic tests is shifting from relatively low entry barriers to much higher, evidence-based barriers that will require test developers to generate evidence of net clinical benefits before widespread clinical use. As clinicians increasingly rely on these tests, patients and the test manufacturers are increasingly concerned about payment coverage. This has created a market that beguiles and worries payers and test makers alike.
Arguably one of the biggest challenges is identifying and adequately capturing clinical benefits. Outcomes that by definition may manifest many years later, such as overall survival, can be completely impractical. An alternative and more feasible clinical utility study will measures clinical practice changes. Such studies might assess disease progression (Crescendo Biosciences); determine the need or lack of need for invasive testing (CardioDx); foment the avoidance of a complication (Iverson Genetics); or narrow the treatment population (Genomic Health). (See the Table for more detail on these scenarios, and those mentioned below.) Thus the key to finding the best outcome for clinical utility is to build a causal framework that links clinical action to the test and then to patient outcome.
The most common pitfalls are to use the wrong clinical proxies (as in the example of Tethys) or to not clearly define the target population (Berkeley HeartLab). Another is to not identify the clinical action or behavioral change that is clearly linked to health status (Agendia). Diagnostic test C&R failures can be further prevented with better expertise and greater investment in clinical utility design. If a company believes in a test’s clinical validity, then the urgency, today, should be to acquire capital and invest early in clinical utility to secure coverage and reimbursement. Data remain the ongoing scientific responsibility of a company developing a new test, and a steadfast commitment to examining clinical utility for the most generalizable audience is the key to clinical and financial success over the long term.