Novel Claims-Based Algorithm Identifies Patients With PAH 6 Months Before Diagnosis


The model has potential to assist integrated delivery network health care providers through early identification of patients with pulmonary arterial hypertension (PAH), a rare but fast-progressing condition.

Researchers developed a claims-based, machine-learning (ML) algorithm to identify patients with pulmonary arterial hypertension (PAH) 6 months before diagnosis. The model is also able to identify patients at a population level who may benefit from PAH-specific screening.

“Many patients with PAH experience substantial delays in diagnosis, which is associated with worse outcomes and higher costs,” the authors of the study, which was published in the journal Pulmonary Circulation, wrote.

However, earlier diagnoses could lead to earlier treatment initiation, which could delay disease progression and subsequent adverse outcomes. PAH is associated with a rapid progression, but typical symptoms like breathlessness and fatigue are nonspecific and can be mistaken for other conditions.

On average, the delay between symptom onset and confirmed PAH diagnosis is over 2 years. One-year mortality estimates range from 8%-17%, and increase to 25%-44% at 3 years.

In an effort to distinguish individuals at risk for PAH early in their symptom journey from patients with similar early symptoms not at risk for developing PAH, investigators created an ML model using retrospective, de-identified data from the Optum Clinformatics Data Mart claims database.

Data recorded between January 2015 and December 2019 were included. The PAH cohort consisted of 1339 patients, and the non-PAH cohort included 4222 patients.

The analysis revealed that at 6 months pre-diagnosis, the model performed well in distinguishing PAH and non-PAH patients, with an area under the receiver operating characteristic curve (AUC-ROC) of 0.84, recall (sensitivity) of 0.73, and precision of 0.50.

Key features distinguishing PAH from non-PAH cohorts were a longer time between first symptom and the pre-diagnosis model date (i.e., 6 months before diagnosis); more diagnostic and prescription claims, circulatory claims, and imaging procedures, leading to higher overall health care resource utilization; and more hospitalizations.

Mean patient age, sex, and race/ethnicity ratios were similar between PAH and non-PAH patient cohorts. Around two-thirds of the population was female and a similar proportion was White.

The AUC-ROC of the early identification model indicated “very good discrimination between patients with PAH and non-PAH patients,” authors wrote. Although precision was only .50, “the model was designed to prioritize sensitivity, as the risks related to a false positive (e.g., undergoing some additional, unnecessary noninvasive PAH screening) are greatly reduced compared with the risk associated with delayed diagnosis of PAH as it is a highly progressive, fatal disease,” they added.

In addition to those listed, worse overall health was another key feature that identified patients with PAH.

Those with PAH also have a higher health care burden, even after adjusting for the additional costs of screening and procedures late in the diagnostic journey, researchers said. This finding is in line with previous studies that showed an average of 25 interactions with a hospital within the three years preceding a PAH diagnosis.

Results indicate shifting the diagnosis just 6 months earlier could have considerable impacts on hospitalization rates, authors stressed.

Missing data and potential diagnosis inaccuracies in the database mark limitations to the current study. Furthermore, the true prevalence of PAH in real-world populations is substantially lower than that included in the study population.

“The ideal use of this model is implementation by integrated delivery network health care providers for early identification of PAH patients; a rare but chronic disease with significant health care utilization, expensive medications, and high-cost specialty care,” authors concluded.

“This is especially important as the health care providers and payers brace for waves of delayed and deferred care that have started with the COVID-19 pandemic.”


Hyde B, Paoli CJ, Panjabi S, Bettencourt KC, Lynum KSB, and Selej M. A claims-based, machine-learning algorithm to identify patients with pulmonary arterial hypertension. Pulm Circ. Published online June 6, 2023. doi:10.1002/pul2.12237

Related Videos
Gary Owens, MD, president, Gary Owens Associates
Image of Gary Owens, MD, President, Gary Owens Associates
Gary Owens, MD
Gary Owens, MD
Gary Owens, MD
Gary Owens, MD
Related Content
© 2023 MJH Life Sciences
All rights reserved.