Currently Viewing:
The American Journal of Accountable Care September 2015
Optimizing the Effect of Electronic Health Records for Healthcare Professionals and Consumers
Maryam Alvandi, RCT, MHS
Reimagining Health and Care to Foster, Not Force, Accountability
Mandi Bishop, MA
ACOs: What Every Care Coordinator Needs in Their Tool Box
Patti Oliver, RN, BSN; and Susan Bacheller, BA
Currently Reading
Applying Weighting Methodologies to a Commercial Database to Project US Census Demographic Data
Thomas Wasser, PhD, MEd; Bingcao Wu, MS; Joseph W. Yčas, PhD; and Ozgur Tunceli, PhD
Applicability of the Omaha System in Acute Care Nursing for Information Interoperability in the Era of Accountable Care
Karen A. Monsen, PhD, RN, FAAN; Elizabeth Schenk, PhD, MHI, RN; Ruth Schleyer, BSN, MSN, RN-BC; and Martin Schiavenato, PhD, RN
Transitioning Our Healthcare System Toward Accountable Care
Michael E. Chernew, PhD

Applying Weighting Methodologies to a Commercial Database to Project US Census Demographic Data

Thomas Wasser, PhD, MEd; Bingcao Wu, MS; Joseph W. Yčas, PhD; and Ozgur Tunceli, PhD
This study tests the feasibility of projecting commercial insurance demographic information to the US Census population, and creating the framework for a simple weighting scheme.
The objective was to investigate the viability of projecting demographic information from a large commercial managed care database to the entire US population, and to provide a simple, pertinent weighting scheme.
Data from the HealthCore Integrated Research Database (HIRD), a repository of enrollee administrative claims from 14 regionally dispersed US health plans, were compared with US Census data. Census-defined regions, gender, and age groups served as demographic standards. To guard against small differences between these large samples appearing statistically significant, an alternative version of goodness-of-fit statistics was used to assess the overall fit of characteristic group variables.

Results: This study compared 14.8 million HIRD enrollees and the 307.7 million individuals from the 2009 US Census. Gender distribution was similar in the groups: females comprised 49.8% (HIRD) and 49.3% (Census). Relative to the US Census, HIRD enrollees were overrepresented in the midwest, underrepresented in the south, and comparable in the northeast and west, with differences of 1% and 0.6%, respectively. HIRD was overrepresented in the 30-to-59 years category and underrepresented in the <5 years and ≥65 years groups; the groups were similar in the 5-to-30 years age group.

Conclusions: In the absence of data on disease prevalence, treatment patterns, and outcomes, commercial health plan databases may provide a reasonable representation of the national population when appropriately weighted to reflect differential demographic characteristics. The ability to conduct and rely on the results of such projections could be of value to key stakeholders such as healthcare planners, policy makers, and payers.
Researchers are keenly interested in ascertaining the impact of disease on society. One of the central elements of this determination is knowledge about the number of individual patients with a disease or condition of interest within a specific region, age group, or gender.1-6 The exact count or even estimates of patients affected by a given disease may not always be available for a variety of reasons, including the absence of reporting requirements or a lack of organized and maintained disease registries or longitudinal patient databases.7,8 To obtain an understanding of the size of patient populations that are not well quantified and characterized, often the only workable option is to extrapolate from available data in repositories such as registries and health plan databases (among others).

Disease prevalence can be estimated in subpopulations with accessible data,9-16 but in extrapolating to the general population, systematic differences in demographic composition must be taken into account. In the United States, it is unlikely that data sets in existing commercial health insurance databases will be representative enough by themselves to present an accurate estimate of the national population. 10,11,14-16 As a result, there is considerable interest in census decomposition methodologies or similar approaches that are capable of rendering the data in such nonrepresentative population samples in a form comparable to US Census data.

Cognizant of their role as a vital and reliable source of data on disease prevalence and the size limitations of commercial health plan databases, the objective of this study was to develop a weighting framework for projecting data from commercial databases to a population matching the demographic composition encompassed by the US Census.

Study Design

This study compared data, demographic structures, and characteristics from a large commercial research database, the HealthCore Integrated Research Database (HIRD), which is notable for its size and geographic breadth, with data from the 2009 US Census. To create a basis for the approximation of counts relative to the US Census data, standard statistical procedures incorporating a suitable alternative to the goodness- of-fit method were used to establish weights for the HIRD. The weighting formulation was then tested with a sample of patients from the northeast region of the United States who were diagnosed with acute coronary syndrome (ACS).

Data Source
This study utilized a large commercial administrative claims database, the HIRD, which contains a broad spectrum of medical, pharmacy, and laboratory information on more than 46 million enrollees in 14 geographically dispersed managed care plans across the United States. The broad range of service models encompassed by these plans includes health maintenance organizations, point of service, preferred provider organizations, and indemnity plans. The data queried from the HIRD were categorized into geographic regions matching those used by the US Census Bureau.

US Census
The US Census Bureau publishes the American Community Survey results every year. The American Community Survey reports population numbers in categories including age, gender, race, and geographic region. No disease prevalence and other types of healthcare utilization information are collected by the American Community Survey. This study was conducted prior to the official release of the 2010 US Census data; as a result, population estimates from the US Census Bureau’s 2009 American Community Survey were used for the total count of individuals residing in the 50 US states.

Researchers had access to limited patient data in this study. Strict measures, in compliance with the 1996 Health Insurance Portability and Accountability Act (HIPAA), were observed to ensure the preservation of patient anonymity and confidentiality throughout. The study did not involve the collection, use, or transmittal of individually identifiable data. It was conducted under the Research Exception provisions of the Privacy Rule, 45 CFR 164.514(e); institutional review board sanction was not indicated.

Inclusion Criteria/Exclusion Criteria
Health plan members within the HIRD who had at least 1 day of health plan enrollment between January 1, and December 31, 2009, were eligible for inclusion in the study. This interval was selected because it represented the most current US Census Bureau’s American Community Survey data release available at the time of the study. Patients with ACS were selected to perform the projection demonstration. The disease was identified with International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes 410.x1 and 411.1x in the claims database.

Statistical Analysis
Goodness-of-fit statistics was not applicable to match the samples because even small differences would appear to be statistically significant because of the large sample sizes. An alternative interpretation of the fit approach was used to examine the overall fit of the lines—census-defined regions, gender, and age groups—characterizing the HIRD and the US Census data. Statistical analyses were conducted with SAS version 9.2 (SAS Institute Inc, Cary, North Carolina).

Standard statistical procedures, comprising of an alternative version of goodness-of-fit statistics, were used to establish weights for the HIRD, to facilitate the approximation of counts relative to the US Census data. The linear weighting was computed as the percentage of the overall population divided by the percentage within the HIRD. Weighting schemes enable the projection from smaller known samples to larger populations in which the desired prevalence rate and other target information are not known. By using weights in a linear model along with specific variables, it is possible to make projections to the larger population by employing the relevant attributes of smaller population.17 This equation yielded a multiplication factor that was used to compute the weighted number of patients within a geographic region, age group, and gender for a specific disease type or drug classification (Table 1). On the basis of these distributions, weights were calculated to adjust for any differences in gender, geographic region, and age distributions observed between the HIRD population and the US census population estimates.

Copyright AJMC 2006-2019 Clinical Care Targeted Communications Group, LLC. All Rights Reserved.
Welcome the the new and improved, the premier managed market network. Tell us about yourself so that we can serve you better.
Sign Up