• Center on Health Equity and Access
  • Clinical
  • Health Care Cost
  • Health Care Delivery
  • Insurance
  • Policy
  • Technology
  • Value-Based Care

Machine Learning Effective in Predicting Hospital Stay, Prolonged Hospitalization in People Living With HIV


Separate machine learning models were able to effectively predict both the length of hospital stay and the risk of prolonged hospitalization in people living with HIV.

The length of stay in a hospital as well as prolonged hospitalization in people living with HIV (PLWH) could both be predicted using machine learning models, according to a study published in Frontiers in Public Health.

There were approximately 1.5 million new HIV diagnoses and 650,000 deaths due to HIV globally in 2021. HIV-associated comorbidities remain major problems in survival in PLWH in China. Planning interventions in diagnosis and management of HIV can be helped through accurate predictions of length of stay in the hospital and identifying risk factors of a longer stay. Machine learning (ML) has the potential to predict these factors. This study aimed to use more than 1 ML model to predict length of stay and the risk of prolonged stay in PLWH.

Patient in hospital bed | Image credit: Koonsiri - stock.adobe.com

Patient in hospital bed | Image credit: Koonsiri - stock.adobe.com

Patients were enrolled from January 2008 to June 2020 from Beijing. There were 2 models that were established: a model to predict the risk of prolonged stay and a second to predict individual length of stay. Patients who were aged 18 years and older were included in the study, whereas those who stayed in the hospital for less than 12 hours were excluded. Demographic data were collected, including clinical data such as the route of HIV transmission, type of transmission, and baseline highly active anti-retroviral therapy (HAART) at admission.

The numeric length of hospital stay was the primary outcome of the study. The risk of prolonged hospital stay was the secondary outcome, with prolonged hospital stay defined as more than 25 days between admission and discharge of the patient.

Risk for non–AIDS-defining events (NADEs) had an increased risk of occurring in PLWH. Multiple opportunistic infections (OIs) were defined as 2 or more pathogens being diagnosed and co-existing. The extreme gradient boosting (XGB) model was the basis of running all models. A 10-fold cross validation and grid search were used to collect hyper parameters used in 4 ML and 5 ML regression model algorithms to predict both length of hospital stay and prolonged hospital stay respectively.

There were 1556 patients included in the study, of which 91.1% were men and the mean age was 45 years. The average baseline CD4 count was 158 cells/ml, multiple OIs were diagnosed in 50.1% of the cases, and 3.3% were found to have had NADEs. Average length of hospital stay was 24.14. A total of 36% of the participants had a prolonged hospital stay, or a stay of more than 25 days. For the ML regression model to predict length of hospital stay and prolonged hospital stay, all participants were split into 2 groups: a training cohort and a validation cohort.

There were 4 regression models used to predict length of hospital stay: random-forest (RF), k–Nearest Neighbor (KNN), support vector machine (SVM), and XGB. The KNN model was found to have the best discriminative capability (root mean square error [RMSE], 12.72; mean absolute error [MAE], 7.23; mean absolute percentage error [MAPE], 0.60). The XGB performed best of the models (RMSE, 16.81; MAE, 10.39; MAPE, 0.98), with KNN performing the worst (RMSE, 19.67; MAE, 11.61; MAPE, 0.99).

There were 5 ML classification models used to evaluate the risk of prolonged hospital stay. The KNN model had the best discriminative capability of all the models (accuracy, 0.9008; positive prediction value [PPV], 0.8982; negative prediction value [NPV], 0.9063; sensitivity, 0.9525; specificity, 0.8096). The NN model was found to be the best overall (accuracy, 0.7623; PPV, 0.7853; NPV, 0.7092; sensitivity, 0.8620; specificity, 0.5882). The KNN model was found to be the overall worst of the models (accuracy, 0.7281; PPV, 0.7607; NPV, 0.6525; sensitivity, 0.8350; specificity 0.5647).

There were some limitations to this study. Selection bias and information bias is possible due to the retrospective nature of the study. All of the data came from a single center. Prognosis and admission could have been affected by a patient’s social environment. External validation was not performed for this study.

The researchers concluded that ML models could help to predict the length of hospital stay and risk factors of a prolonged stay in the hospital. The XGB model can be helpful in predicting length of stay whereas the NN model could be used to predict prolonged hospitalization. Waste of health care resources can be helped through using an intelligent medical prediction system in the future.


Li J, Hao Y, Liu Y, et al. Supervised machine learning algorithms to predict the duration and risk of long-term hospitalization in HIV-infected individuals: a retrospective study. Front Public Health. 2024;11:1282324. doi:10.3389/fpubh.2023.1282324

Related Videos
Will Shapiro, vice president of data science, Flatiron Health
Will Shapiro, vice president of data science, Flatiron Health
Kathy Oubre, MS, Pontchartrain Cancer Center
Emily Touloukian, DO, Coastal Cancer Center
dr krystyn van vliet
dr mitzi joi williams
Stephen Speicher, MD, MS
dr marisa mcginley
Mike Brown, Vice President of Managed Care, Cardinal Health
Related Content
© 2024 MJH Life Sciences
All rights reserved.