Currently Viewing:
The American Journal of Accountable Care September 2015
Optimizing the Effect of Electronic Health Records for Healthcare Professionals and Consumers
Maryam Alvandi, RCT, MHS
Reimagining Health and Care to Foster, Not Force, Accountability
Mandi Bishop, MA
ACOs: What Every Care Coordinator Needs in Their Tool Box
Patti Oliver, RN, BSN; and Susan Bacheller, BA
Currently Reading
Applying Weighting Methodologies to a Commercial Database to Project US Census Demographic Data
Thomas Wasser, PhD, MEd; Bingcao Wu, MS; Joseph W. Yčas, PhD; and Ozgur Tunceli, PhD
Applicability of the Omaha System in Acute Care Nursing for Information Interoperability in the Era of Accountable Care
Karen A. Monsen, PhD, RN, FAAN; Elizabeth Schenk, PhD, MHI, RN; Ruth Schleyer, BSN, MSN, RN-BC; and Martin Schiavenato, PhD, RN
Transitioning Our Healthcare System Toward Accountable Care
Michael E. Chernew, PhD

Applying Weighting Methodologies to a Commercial Database to Project US Census Demographic Data

Thomas Wasser, PhD, MEd; Bingcao Wu, MS; Joseph W. Yčas, PhD; and Ozgur Tunceli, PhD
This study tests the feasibility of projecting commercial insurance demographic information to the US Census population, and creating the framework for a simple weighting scheme.
Study Populations and Demographic Comparison

During the 2009 calendar year, the HIRD included 14.8 million enrollees, and the US Census Bureau’s 2009 American Community Survey data projected in excess of 307.7 million individuals, within an estimated accuracy of 0.1% (margin of error: 0.001) 18 who were used as a base populations in this study. The HIRD population was similar to the US Census estimates in gender distribution, with females comprising 49.8% and 49.3% of their totals, respectively. Relative to the US Census estimates, the HIRD population appeared overrepresented in the midwest and underrepresented in the south. The HIRD population closely matched US Census estimates for the northeast and west regions, differing by only 1% in the northeast and 0.6% in the west (Table 2).

Age Distribution Comparison
The age group distributions of the HIRD and US Census populations are shown in Figure 1. The HIRD population had relatively higher representation of age categories between 30 and 59 years; it was underrepresented in the age categories <18 years and ≥65 years relative to the US Census. Although there was close agreement between the 2 populations for ages 5 to 30 years and 55 to 70 years, the overall age group of 18 to 64 years is overrepresented in the HIRD.

Weight Computation Based in the Northeast Region
To demonstrate the weight computation model, weight calculations were applied to the northeast region. In 2009, approximately 0.70% of the US Census population was male, aged 45 to 49 years, and lived in the northeast, while around 0.72% of the HIRD population shared the same geographic region, gender, and age characteristics. Thus, the weight for the male population aged 45 to 49 years living in the northeast during that time period was 0.9690% (Table 3).

Projection of ACS Patients in the Northeast
Table 4 reports the results of weighting the number of HIRD patients with ACS in the northeast region within each age and gender stratum and projecting to the northeast US Census Bureau population. The HIRD had a total of 452 male members from the northeast region, aged 45 to 49 years, who had at least 1 claim with a diagnosis for ACS from January 1 to December 31, 2009. On the basis of the weight for this population group (0.9690), the projection of ACS diagnosis in a representative sample the same size as the HIRD repository (~4.82% of the US population) would be 438 patients (452 × 0.9690) and 9089 in the overall US Census Bureau population (438 ÷ 4.82%). Application of this weighting scheme results in a greater proportion of ACS patients in the ≥65 years age category and a smaller proportion of patients aged 30 to 64 years relative to the original HIRD estimate (Figure 2).

While healthcare data are hardly abundantly available for the entire US population, a considerable volume may be found in veritable data silos such as institutional disease registries and the transactional databases of health plans, among other repositories. For healthcare planners and budget directors, access to plausible population estimates is crucial for decision making. One avenue for population level figures for health budget projections and allocations is to extrapolate from smaller data collections. It is essential to have robust and reliable weighting tools, which are capable of achieving the low margin of error requirements, necessary for such projections.19,20 Driven by this need, this study developed a simple weighting tool to project heath plan data to estimate prevalence rates at the national level.

Although the HIRD represents a population that is slightly less than one-twentieth (~4.82%) of the US population—as represented by the 2009 US Census Bureau count—the data in the HIRD are remarkably representative of the entire US population. HIRD data trended in parallel with the US Census data on gender distribution, regional distribution in the northeast and west regions, but as was expected, it was overweight in the 30 to 59 years age category because the repository consists largely of employer-insured working age people.

Reflecting the source of the majority of people represented in the HIRD repository—enrollees of employer sponsored commercial healthcare insurance—the population aged ≥65 years appears to be relatively underrepresented. Still, the HIRD contains a sizable sample of ≥65-years-old enrollees who may be receiving commercial employer sponsored health benefits, or Medicare advantage, supplement, or Part D benefits. The sample size of this population is substantive enough to allow the application of this weighting methodology to extrapolate the data into the overall US population with statistically acceptable variance.

The weighting methodology developed in this study was tested on the ACS patients from northeast region as an illustration of how the weighting scheme may be applied in practice. While this example specifically addressed ACS patients, it demonstrated how the number of patients in the overall US population for any disease may be estimated from commercially derived healthcare data repositories like the HIRD. This study essentially demonstrated that by using a linear weighting methodology that accounts for differences in geographic regions, age, and gender between an accessible database and the US Census data, it was possible to estimate the prevalence of a number of important healthcare factors. Among areas that may be evaluated using this approach are disease prevalence, healthcare resource utilization, treatment patterns for therapies of interest, and current and potential use of pharmaceutical agents and other treatment modalities.

One of the key objectives of this study was the development of a projection method and a weighting scheme that could be applied to a range of disease conditions and therapeutic categories for which data were available in a repository—such as the HIRD. An important strength of this approach is that it allows for adjustments in the variables or for the updating of estimates of interest with the most current or different data as needed.

Weighed estimations have important planning, resource allocation, and cost management implications for a variety of stakeholders including patients, providers, and payers who have to make decisions based on research results, disease prevalence, treatment availability, and drug utilization, among other factors.

The results of the weighting scheme and ACS projection example discussed in this study must be viewed against some important limitations. This study relied on secondary data from commercial health plans across the United States. These data may have some relevance to similar commercial health plans, but only limited external validity for different patient populations such as the US Medicaid and Medicare programs. In addition, administrative claims lack data on race, ethnicity, and risk factors capable of influencing outcomes. Administrative claims data are prone to over- and underestimations (eg, for patients, disease, medication use, other areas) because of basic assumptions about index events, inability to capture and account for all treatments received by patients, and basic coding and clerical errors. Furthermore, extrapolation was done beyond the point of observable data, contravening a standard requirement of statistical methodology, and likely impacting the robustness of the results. In addition, notable differences existed between the values in the HIRD commercial database and the US Census data. The weights were calculated on the basis of 2009 ACS projections, not official US Census counts.

Consistent with its commercial employment origins and characteristics, the HIRD repository, while representative of US Census data, was overweighting the 30-to-59 years category. The age groups ≥65 years were underrepresented in the HIRD but still accounted for a substantial sample size. While extrapolations beyond observable data have statistical limitations, in the absence of data on disease prevalence and treatment for the US population as a whole, commercial databases could be viable for projecting patient counts within US Census parameters. This could be invaluable to key stakeholders such as healthcare planners, policy makers, and payers.

Acknowledgments: Bernard B. Tulsi, MSc, provided writing and other editorial support for this manuscript. The authors wish to thank Chaozheng Yang, MS, former research analyst at HealthCore, Inc, for contributions to the study’s design and data analysis.

Author Affiliations: HealthCore, Inc (TW, BW, OT), Wilmington, DE; AstraZeneca Pharmaceuticals LP (JWY), Wilmington, DE.

Funding Source: Funding for this research project was provided by AstraZeneca Pharmaceuticals LP.

Author Disclosures: Drs Wasser and Tunceli and Mr Wu are employees of HealthCore, Inc, a wholly owned research and consulting subsidiary of Anthem, a national health insurance company. Dr YÄÂ
1. Last JM, ed. A Dictionary of Epidemiology. 4th ed. New York, NY: Oxford University Press; 2000.

2. Thacker SB. Epidemiology and public health at CDC. MMWR. 2006;55(suppl 2):3-4.

3. McKenna MT, Zohrabian A. U.S. burden of disease--past, present and future. Ann Epidemiol. 2009;19(3):212-219.

4. Terris M. The Society for Epidemiologic Research (SER) and the future of epidemiology. Am J Epidemiol. 1992;136(8):909-915.

5. Terris M. The Society for Epidemiologic Research and the future of epidemiology. J Public Health Policy. 1993;14(2):137-148.

6. Thacker SB, Dannenberg AL, Hamilton DH. Epidemic intelligence service of the Centers for Disease Control and Prevention: 50 years of training and service in applied epidemiology. Am J Epidemiol. 2001;154(11):985-992.

7. Mehta P, Antao V, Kaye W, et al. Prevalence of amyotrophic lateral sclerosis - United States, 2010-2011. MMWR. 2014;63(7):1-13.

8. Adams DA, Jajosky RA, Ajani U, et al. Summary of notifiable diseases. MMWR. 2014;61(53):1-121.

9. Chini F, Pezzotti P, Orzella L, Borgia P, Guasticchi G. Can we use the pharmacy data to estimate the prevalence of chronic conditions? a comparison of multiple data sources. BMC Public Health. 2011;11:688.

10. Choy M, Switzer P, De Martel C, Parsonnet J. Estimating disease prevalence using census data. Epidemiol Infect. 2008;136(9):1253-1260.

11. Costa MA, Huang SS, Moore M, Kulldorff M, Finkelstein JA. New approaches to estimating national rates of invasive pneumococcal disease. Am J Epidemiol. 2011;174(2):234-242.

12. Guzmán Herrador BR, Aavitsland P, Feiring B, Riise Bergsaker MA, Borgen K. Usefulness of health registries when estimating vaccine effectiveness during the influenza A(H1N1)pdm09 pandemic in Norway. BMC Infect Dis. 2012;12:63.

13. Hanson LA, Zahn EA, Wild SR, Dopfer D, Scott J, Stein C. Estimating global mortality from potentially foodborne diseases: an analysis using vital registration data. Popul Health Metr. 2012;10(1):5.

14. Saaddine JB, Honeycutt AA, Narayan KM, Zhang X, Klein R, Boyle JP. Projection of diabetic retinopathy and other major eye diseases among people with diabetes mellitus: United States, 2005-2050. Arch Ophthalmol. 2008;126(12):1740-1747.

15. Wendt JK, Symanski E, Du XL. Estimation of asthma incidence among low-income children in Texas: a novel approach using Medicaid claims data [published online September 28, 2012]. Am J Epidemiol. 2012;176(8):744-750.

16. Zaher C, Goldberg GA, Kadlubek P. Estimating angina prevalence in a managed care population. Am J Manag Care. 2004;10(11 suppl):S339-S346.

17. Bethlehem JG, Keller WJ. Linear weighting of sample survey data. Journal of Official Statistics. 1987;3(2):141-153.

18. American Community Survey multiyear accuracy of the data (3-year 2008-2010 and 5-year 2006-2010). US Census Bureau website. Published 2011. Accessed August 13, 2015.

19. Merrill RM, Capocaccia R, Feuer EJ, Mariotto A. Cancer prevalence estimates based on tumour registry data in the Surveillance, Epidemiology, and End Results (SEER) Program. Int J Epidemiol. 2000;29(2):197-207.

20. Nacul LC, Soljak M, Meade T. Model for estimating the population prevalence of chronic obstructive pulmonary disease: cross sectional data from the Health Survey for England. Popul Health Metr. 2007;5:8.
Copyright AJMC 2006-2020 Clinical Care Targeted Communications Group, LLC. All Rights Reserved.
Welcome the the new and improved, the premier managed market network. Tell us about yourself so that we can serve you better.
Sign Up