The infrastructure also provides a blueprint for similar initiatives in other rare diseases.
Researchers were able to implement a pilot Sickle Cell Data Collection Program (SCDC) common informatics infrastructure to help make up for sparse population-level data on the disease. The infrastructure also provides a blueprint for similar initiatives in other rare diseases, according to a study published in the journal JAMIA Open.
The exact number of people with SCD in the United States is unknown, though estimates put it at around 100,000. However, these patients can have severe complications, associated comorbidities, and rely heavily on acute healthc are services, researchers explained.
In addition, much of the existing health services research for SCD relies on administrative case definitions using ICD codes. This method could ultimately underestimate the actual SCD population. To address the lack of data on this disease, the CDC funded the SCDC, which aims to develop state-level, multi-source surveillance programs.
“Common informatics infrastructures such as common data models allow distributed data networks to standardize, integrate, and analyze data across multiple sources,” the study authors wrote. When the same data definitions are used, analyses spanning multiple sites can be conducted efficiently and with higher quality.
The CDC established the SCDC in 2015. The program includes 11 states and consists of state-level newborn screening and Medicaid claims data. The majority of programs also include electronic medical record data, state all-payer datasets and clinical cohorts.
In the current pilot study, researchers implemented the Core Surveillance Data Instrument of the Common Data Model in Tennessee, North Carolina, and Michigan.
They also identified key data instruments for public health SCD reporting. Using newborn screening records, investigators reported the number of SCD births in 1-year and 5-year increments, by sex, race, county, ethnicity and SCD type. State laboratory testing was used to identify SCD type.
Healthcare claims datasets, newborn screening and clinical datasets were used to determine case number estimates across each state by county, sex, and age group. In addition, state vital records derived from death certificates allowed researchers to report death information by age at death, stratified by sex.
“Core Surveillance Data for healthcare utilization report acute care utilization including number of hospitalizations, hospital length-of-stay, and number of emergency department visits (without admission),” the authors explained.
Data complied by state SCDC teams showed:
“Our approach builds one data instrument at a time in a model that is expandable and modifiable. These results will inform the design of the SCDC program in the future, and we hope it will be adopted by data holders beyond SCDC,” researchers said.
The common data model framework offers numerous benefits including standardization, which can be used to leverage cross-state knowledge. Adoption of the pilot model can also improve data query speed. This may lead to faster dissemination of SCDC study findings.
Implementing a common data model across multiple data sources can have challenges, authors cautioned. For instance, discrepancies could be identified during variable definition. It’s also important to balance standardization and flexibility, researchers said.
“Limitations notwithstanding, active surveillance programs using administrative data are of great value in understanding rare diseases,” they concluded. “It is paramount to remember the purpose of surveillance, to answer questions about location of individuals with disease, access to health care, necessary resource allocation, and policy-impact questions; not to dictate individual care.”
Smeltzer MP, Reeves SL, Cooper WO, et al. Common data model for sickle cell disease surveillance: considerations and implications. JAMIA Open. Published online May 27, 2023. doi:10.1093/jamiaopen/ooad036