Machine Learning Model Could Predict Hidradenitis Suppurativa Diagnosis

Author(s):

A recent study applied machine learning to medical and pharmacy claims data to develop a model for predicting hidradenitis suppurativa (HS) diagnosis, highlighting the potential for improved understanding of HS underdiagnosis on a health care system level.

Utilizing machine learning, researchers developed a clinical decision support model that can aid providers in predicting hidradenitis suppurativa (HS) diagnoses and distinguishing it from other skin conditions that mimic the condition.¹

Machine learning | Image Credit: © WrightStudio - stock.adobe.com

According to a study published in Frontiers in Medical Technology, the model could aid in faster and more accurate recognition of HS, potentially reducing diagnostic delays and associated costs to health care systems.¹ Further validation through testing on external data sets and in clinical settings, compared against dermatologist diagnoses, is advised to refine and optimize the model's performance.

HS is a chronic inflammatory follicular skin condition characterized by painful lesions in the intertriginous skin areas that can result in odor, drainage, and disfigurement, leading to psychosocial burdens and worsened quality of life for patients. Its prevalence varies globally and is more common among women, smokers, and those with metabolic syndrome. Additionally, Black and biracial patients are 2 to 3 times more likely to experience HS than White patients, according to a US study.²

Diagnosis relies on clinical criteria, and early recognition is crucial for better management.¹ Misdiagnosis and underdiagnosis lead to prolonged suffering and increased health care costs. Machine learning is increasingly being used to aid in disease recognition, including of HS. Its application to electronic health records and claims databases has also been successful in identifying various conditions like depression, ankylosing spondylitis, cardiomyopathy, dementia, and hepatitis C.

Researchers used datasets from IBM MarketScan Research Databases from 2000 to 2018 to train and test the machine learning model, and data from 2018 and 2019 were used to validate the models. The databases contained adjudicated medical and pharmaceutical reimbursement claims for more than 225 million patients enrolled in commercial, Medicare, and Medicaid health plans throughout the US.

Six single machine learning algorithms and 2 ensemble methods were considered, with the final model chosen based on performance measures and consultation with dermatologists. Performance metrics like area under the curve, sensitivity, precision, and accuracy were used to assess and select the optimal model, with a precision/accuracy threshold of 0.7 deemed satisfactory.

Among the 411,061 patients with HS identified from January 2000 to March 2018, 55,989 were assessed for the study. Additionally, 278,483 patients with documented abscesses and 1,431,524 patients with documented cellulites were included as controls.

The primary results revealed that high-performing machine learning models for predicting HS diagnosis can be built using claims data, with top models achieving diagnostic accuracies of up to 65% to 73% and an area under the curve of 81% to 82%. Models trained to differentiate HS from cellulitis performed better than those trained on abscesses, likely due to the similarity of abscesses to HS lesions. The top 3 models identified were AdaBoost, LightGBM, and MaxVoting, with age, gender, and certain risk factors being strong predictive features. Additionally, diagnostic features and specific comorbidity diagnoses were important predictors across different algorithms and cohorts.

The sensitivity analysis and validation results show that shorter timeframes around the index date yield comparable performance metrics for predicting HS, suggesting that shorter data periods are reliable for model development in claims analyses. The validation results indicate consistent performance among the top 3 models, predicting 64% to 69% of patients with true HS, with models 1 and 2 showing stronger performance.

An exploratory application revealed significant underdiagnosis of HS among patients with abscess or cellulitis, varying by metropolitan statistical area and model used. This suggested that implementing machine learning models could help health systems identify patients with undiagnosed HS for further evaluation and research.

The study noted limitations in generalizability, data structure requirements for model application, potential algorithm variations across populations, and areas for model improvement like addressing medical coding errors and considering contextual factors such as temporal relations between patient claims.

References

1. Kirby J, Kim K, Zivkovic M, et al. Uncovering the burden of hidradenitis suppurativa misdiagnosis and underdiagnosis: a machine learning approach. Front Med Technol. Published online March 25, 2024. doi:10.3389/fmedt.2024.1200400

2. Garg A, Kirby JS, Lavian J, Lin G, Strunk A. Sex- and age-adjusted population analysis of prevalence estimates for hidradenitis suppurativa in the United States. JAMA Dermatol. 2017;153(8):760-764. doi:10.1001/jamadermatol.2017.0201