Using Medicare Data for Comparative Effectiveness Research: Opportunities and Challenges

This review article explores the Medicare data available for researchers and approaches that could be used to enhance the data%u2019s value for comparative effectiveness research.
Published Online: July 14, 2011
Vicki Fung, PhD; Richard J. Brand, PhD; Joseph P. Newhouse, PhD; and John Hsu, MD, MBA, MSCE

Background: With the introduction of Part D drug benefits, Medicare began to collect information on diagnoses, treatments, and clinical events for millions of beneficiaries. These data are a promising resource for comparative effectiveness research (CER) on treatments, benefit designs, and delivery systems.


Objective: To explore the data available for researchers and approaches that could be used to enhance the value of Medicare data for CER.


Challenges and Opportunities: Using currently available Medicare data for CER is challenging; as with all administrative data, it is not possible to capture every factor that contributes to prescribing decisions and patients are not randomly assigned to treatments. In addition, Part D plan selection and switching may influence treatment decisions and contribute to selection bias. Exploiting certain program aspects could address these limitations. For example, ongoing changes in Medicare or plan policies and the random assignment of beneficiaries with Part D low-income subsidies into plans with different formularies could yield natural experiments.

Policy Implications: Refining policies for time to data release, provision of additional data elements, and linkage with more beneficiary level information would improve the value and usability of these data. Improving the transparency and reproducibility of findings, and potential open access for qualified stakeholders are also important policy considerations. Data needs must be reconciled with current policies and goals.

Conclusions: Medicare data provide a rich resource for CER. Leveraging existing program elements, combined with some administrative changes in data availability, could create large data sets for evaluating treatment patterns, spending, and coverage decisions.


(Am J Manag Care. 2011;17(7):489-496)

Medicare now collects information on diagnoses, treatments, prescription drug use, and clinical events for millions of beneficiaries. These data are a promising resource for comparative effectiveness research (CER) on treatments, benefit designs, and delivery systems; however, there are a number of challenges to using these data for CER.


  •  We explore the data available for researchers and approaches that could be used to enhance the value of Medicare data for CER.


  •  Leveraging existing program elements, combined with some administrative changes in data availability, could create large data sets for evaluating treatment patterns, spending, and coverage decisions.
Information on the safety, effectiveness, and value of medical care requires detailed clinical data from large numbers of patients receiving care in real-world settings. With the introduction of Medicare Part D prescription drug benefits in 2006, Medicare began to collect information on the use of prescription drugs for more than 27 million beneficiaries.1 Previously, this information was not widely available for Medicare beneficiaries, although drug use data were available for dual-eligible Medicare beneficiaries via Medicaid claims. The more comprehensive collection of drug use data allows for linkage with previously available information from Parts A and B including

inpatient and outpatient diagnoses, and major clinical events for millions of beneficiaries, including many persons over the age of 65 years. These data provide a promising resource for assessing the comparative effectiveness of many types of care across a range of settings and geographic areas in the United States.

Because Medicare collects data for payment and administrative purposes and not for research, there are several limitations to using these observational data. The care beneficiaries receive will vary depending on where they live, the types of Medicare plans they choose, and their physicians and hospitals. In other words, there may be factors associated with both the care beneficiaries receive and the outcomes of care; these factors confound assessments of care effectiveness and limit the validity of simple comparisons. However, exploiting certain program aspects and ongoing natural experiments within the Medicare program can mitigate some biases associated with purely observational data.

In this review article, we discuss these strengths and limitations of using Medicare data for comparative effectiveness research (CER) and propose policy recommendations for improving the usefulness of these data for patients, providers, and policy makers.


There is a profound need for more evidence to guide clinical and policy decisions on drug treatments, devices, interventions, care delivery, payment models, and delivery systems. For example, while there is arguably substantial trial evidence supporting the use of many prescription drugs, this evidence often provides limited guidance for actual clinical decisions.

There are 2 main approaches for developing comparative effectiveness evidence: (1) clinical trials, including randomized clinical trials (RCTs) and pragmatic trials, and (2) studies using observational data from actual practice. While doubleblinded RCTs represent the gold standard for generating clinical evidence, they have a number of practical limitations. Specifically, trials have historically compared single drugs with placebo rather than with existing alternative drugs, or examined them in combination with commonly used drug regimens. Trials are done under rigorous experimental conditions (efficacy) rather than real-world situations (effectiveness), and are not designed to evaluate costs or rare adverse events. Randomized clinical trials also tend to be expensive and examine relatively short-term effects. Moreover, trials may have limited generalizability to specific subgroups of patients (eg, the elderly, racial/ethnic minorities, those with severe diseases) because they tend to target relatively homogeneous patient groups rather than the broader mix seen in actual practice.2-4 Pragmatic trials attempt to overcome some of these limitations by focusing on more heterogeneous groups of patients and by evaluating effectiveness under routine care; however, existing evidence from pragmatic trials is in short supply and funding for these types of studies is limited.5,6

Studies using observational, longitudinal data sets could provide complementary information that addresses many of the limitations associated with RCTs. For example, Medicare collects information on millions of beneficiaries and allows for linkages across a range of claims data, including inpatient, outpatient, and prescription drug data. Having a large sample of individuals is critical for ensuring adequate statistical power for studying rare conditions or specific patient subgroups. Use of observational data (including Medicare data), however, requires consideration of numerous factors such as the types of Medicare plans for inpatient, outpatient, and drug services; coverage/cost sharing for treatments; availability of physicians and hospitals; and clustering of patients by physician. In addition, there can be variations in practice patterns across geographic areas. Examinations of drug use within Part D should consider these various levels of analysis and account for factors

that could affect drug use, adherence, and ultimately outcomes, including a range of patient, provider, and plan-level characteristics. In short, the strengths and limitations of both clinical trials and observational data analyses should be considered carefully when evaluating the value of these approaches for addressing specific questions.

The following sections describe relevant structural aspects of the Medicare Part D program, the Medicare data available for researchers, and potential approaches that could be used to create quasi-experiments and enhance the value of historical Medicare data for CER.


Medicare currently collects diagnostic and treatment information through 4 programs: Part A (inpatient), Part B (outpatient), Part C (Medicare Advantage, which includes medical information for beneficiaries enrolled in managed care organizations), and Part D (prescription drugs). Part D is administered by private plans either as stand-alone Prescription Drug Plans (PDPs) that supplement traditional Medicare or Medicare Advantage Prescription Drug (MAPD) plans that bundle Part A, B, and D benefits. Part D is a voluntary benefit; in 2009 about 27 of 45 million Medicare beneficiaries were enrolled in a Part D plan, including 9.6 million lowincome beneficiaries who received additional premium and cost-sharing subsidies from Medicare. The Centers for Medicare & Medicaid Services (CMS) randomly assigns low-income subsidy beneficiaries who have not chosen a Part D plan to qualified stand-alone drug plans.7

Beneficiaries choose their Part D plans; these plans have some autonomy in determining their benefit structures and formulary drug lists provided they meet basic Medicare requirements. For example, all plans must offer benefits at least as generous as the defined standard. Medicare also requires that plans cover at least 2 drugs within a therapeutic class. However, plans can determine coverage for specific drugs within a class, as well as tier placement and utilization management requirements.

The use of utilization management tools such as prior authorization has grown since Part D’s introduction; these tools are most often used for drugs that are newer, more expensive, or more risky, with greater potential for adverse effects or with less available evidence on the possible benefits or harms.8


Since the introduction of the drug benefits, Part D plans have submitted detailed information on prescription drug events for all Part D beneficiaries to CMS. Researchers can apply for access to Research Identifiable Files that include beneficiarylevel information on Part A, B, and D claims. Proposals are reviewed by the CMS privacy board; if approved, researchers must sign a Data Use Agreement specifying the terms of use, including the destruction or return of data at the study’s end.9 The final rule permitting release of the newly available Part D data to researchers was issued in May 2008. In addition to usual protections for beneficiary privacy, this rule includes additional protections for commercially sensitive plan information. Since the initial release of Part D data, CMS has rolled out an increasing number of data elements and linkages, and is continuing to assemble supplemental data files.

Part D Event Data

Table 1 outlines the available Part D research data files.10 The primary data source for Part D drug utilization is the Part D Event (PDE) files, which are currently available for

2006-2008. These files contain detailed information on each drug event for PDP and MAPD plan beneficiaries, and encrypted beneficiary, pharmacy, prescriber, and plan identifiers that allow linkage with other files such as inpatient and outpatient claims data, and Part D plan characteristics files. The PDE contains information on each drug dispensed including the National Drug Code, the quantity dispensed, and days of supply, allowing for the examination of therapy adherence and persistence based on dispensing data, which have been previously validated.11,12

The PDE data also capture cost information such as total drug costs and patient payments. These data allow for examination of variation in spending patterns and cost-of-care analyses. The PDE data also specify the benefit phase during which each prescription was filled (eg, deductible or initial coverage phase) based on the benefit structure implemented by each beneficiary’s plan, which affects costs for both patients and payers. This file also includes plan-specific information on the formulary coverage for each drug dispensed, including the tier and utilization management requirements. Plan-level information on formulary and benefit structures can be valuable for identifying quasi-experiments or instrumental variables for statistical analyses.

Beneficiary Summary Files

PDF is available on the last page.
Adult ADHD Compendium
COPD Compendium
Dermatology Compendium
Diabetes Compendium
GI Compendium
Immuno-oncology Compendium
Lipids Compendium
MACRA Compendium
Oncology Compendium
Pain Compendium
Reimbursement Compendium
Rheumatoid Arthritis Compendium
Know Your News
HF Compendium
Managed Care PODCAST