New AI Tool Identifies Undiagnosed PNH in Health Records

Author(s):

Key Takeaways

Machine learning models can identify undiagnosed PNH cases by analyzing EHR data, potentially transforming rare disease diagnostics.
PNH symptoms overlap with other conditions, leading to frequent misdiagnosis and delayed care, highlighting the need for improved diagnostic tools.
The model showed high specificity, correctly classifying 99.99% of non-PNH cases, but its positive predictive value was limited by PNH's rarity.
Key predictive features included aplastic anemia, pancytopenia, and Budd-Chiari syndrome, with relevance for hematology and urology referrals.
Further validation across diverse datasets is essential before clinical deployment, aiming to reduce diagnostic delays and improve patient outcomes.

The machine learning model shows promise in detecting paroxysmal nocturnal hemoglobinuria (PNH) by assessing electronic health records (EHR) data, potentially transforming the diagnostic landscape for rare diseases.

Medicine doctor analysis and diagnosis checking health of patient online and testing result with modern virtual interface on laptop with stethoscope in hand, Online Medical global network - ipopba - stock.adobe.com

Image Credit: ipopba - stock.adobe.com

A recent study explored the potential of a machine learning model to assist in identifying undiagnosed cases of paroxysmal nocturnal hemoglobinuria (PNH), a rare, life-threatening blood disorder that often goes misdiagnosed or undiagnosed due to its broad range of symptoms and lack of awareness among health care providers.¹The findings support the ability of machine learning artificial intelligence (AI) to detect rare conditions within electronic health records (EHR) data, potentially transforming the diagnostic landscape for conditions like PNH.

"Thirty-five percent of PNH patients in the UK report symptoms at least twelve months before receiving a diagnosis and 13% receive at least one misdiagnosis," the study stated. "In the present data, roughly 80% of PNH patients have relevant features coded in the structured primary care electronic health record before their diagnosis and could potentially be diagnosed earlier."

Each year, approximately 6 per 1 million people are diagnosed with PNH.² The condition typically affects men and women between the ages of 30 and 40; women are slightly more likely than men to develop it. PNH has a higher likelihood of emerging in individuals with bone marrow disorders like aplastic anemia or myelodysplastic syndrome.

Patients with PNH experience hemolysis, blood clots, and organ damage.¹ The researchers highlighted the difficulty in diagnosing the disease because its symptoms are diverse and can overlap with other common conditions. Therefore, these patients often experience delays in diagnosis or are misdiagnosed, contributing to prolonged periods of inadequate care and increased risks of complications.

To address this gap, the researchers developed a machine learning algorithm using data from the Optimum Patient Care Research Database (OPCRD), marking the first attempt to implement machine learning for distinguishing between PNH and non-PNH cases using structured EHR data.

The study included 131 patients with PNH and 593,838 patients without PNH as controls from the OPCRD database of general practitioner records. Researchers employed a tree-based XGBoost model to classify patients based on clinical features such as symptoms, diagnoses, and health care utilization patterns. Key characteristics considered included conditions often associated with PNH, such as aplastic anemia, pancytopenia, and Budd-Chiari syndrome.

Data processing involved rigorous cleaning steps, including removing duplicate records and excluding patients without relevant clinical features. After refining the dataset, the final analysis was conducted on a sample reflecting a ratio of approximately 1 PNH case per 4533 controls.

Recall, Specificity, and Positive Predictive Value

The machine learning model demonstrated promising performance in identifying patients at risk of PNH. When assessing recall, the model correctly identified 27% (95% CI, 15%–39%) positive cases of PNH. Of those identified, just over 60% had experienced a prior PNH diagnosis, indicating that the patients exhibited characteristics similar to those with confirmed PNH. When assessing the model's specificity performance, the researchers found that 99.99% of the controls were correctly classified as negative, supporting the model’s accuracy in ruling out non-PNH cases.

The initial Positive Predictive Value (PPV) was 60.4% (95% CI, 33% to 82%), indicating that over half of the flagged patients already had a PNH diagnosis in their records. However, when researchers adjusted for PNH’s rarity (3.81 cases per 100,000 individuals), the PPV decreased to 19.59% (95% CI, 7.63% to 41.81%), suggesting that approximately 1 in 5 flagged patients may need further investigation for PNH.

The model’s top predictive features included aplastic anemia, pancytopenia, hemolytic anemia, myelodysplastic syndrome, and Budd-Chiari syndrome (P < .0001), with additional relevance for factors such as hematology and urology referrals and specific blood tests.

Addressing Underdiagnosis in PNH

The researchers noted that the PNH prevalence found in the OPCRD was lower than expected, potentially due to underdiagnosis or lack of coding accuracy in general practice records. Given this discrepancy, machine learning could prove invaluable in narrowing this gap, offering a new tool for clinicians to identify patients at risk of PNH earlier in their clinical journey.

By integrating the clinical understanding of PNH with machine learning techniques, this model offers a new approach to screening for rare diseases in large health datasets, the study stated. If validated further, this algorithm could be applied to live health data, enabling clinicians to identify potential PNH cases more promptly, ultimately reducing diagnostic delays and improving patient outcomes.

Limitations and Future Investigation

The study’s reliance on cross-validation meant exact numbers of flagged cases could not be reported; the researchers provided the averaged metrics to represent performance. They acknowledged that further validation across broader and more diverse datasets is essential before the algorithm could be deployed in clinical practice. However, this AI machine learning approach sets the stage for leveraging machine learning to improve diagnostics in rare diseases, an area where conventional methods have often fallen short.

"Our results show that for every five cases flagged by the algorithm, one case could be a PNH patient. Further work is needed to validate and assess performance in independent samples, with the ultimate goal being real-world deployment," the researchers concluded. "If successful, this tool has the potential to reduce diagnostic delays for PNH patients."

References

1. Worker A, Mahon H, Sams J, et al. A machine learning algorithm for the detection of paroxysmal nocturnal haemoglobinuria (PNH) in UK primary care electronic health records. Orphanet J Rare Dis. 2024;19(1):378. Published 2024 Oct 13. doi:10.1186/s13023-024-03406-4

2. Paroxysmal Nocturnal Hemoglobinuria. Cleveland Clinic. Updated April 25, 2022. Accessed October 30, 2024. https://my.clevelandclinic.org/health/diseases/22871-paroxysmal-nocturnal-hemoglobinuria

Stay ahead of policy, cost, and value—subscribe to AJMC for expert insights at the intersection of clinical care and health economics.

Subscribe Now!

New AI Tool Identifies Undiagnosed PNH in Health Records

Key Takeaways

Recall, Specificity, and Positive Predictive Value

Addressing Underdiagnosis in PNH

Limitations and Future Investigation

Newsletter