
Machine Learning May Enable Earlier Detection of CKD Risk Factors
Key Takeaways
- A combined pipeline of advanced feature selection plus supervised/ensemble classifiers improved CKD risk prediction compared with using raw variables without tuning.
- Hemoglobin ranked among the most informative features, with blood urea, sodium, RBC count, potassium, and hypertension status also strongly contributing to risk stratification.
Machine learning–based risk prediction models, particularly gradient boosting, can accurately identify early clinical indicators of CKD.
A new study published in
Based on their findings, the researchers suggest that integrating ML-based predictive analytics into clinical workflows could enhance decision-making, support earlier diagnosis, and potentially reduce the need for costly interventions such as dialysis or transplantation, particularly in resource-limited settings where access to screening remains limited.
CKD is strongly associated with conditions such as hypertension, cardiovascular disease (CVD), and diabetes. For example, it’s been estimated
With this in mind, researchers developed a predictive pipeline that combines 2 advanced feature selection techniques with 8 supervised and ensemble ML algorithms. The study utilized a CKD-focused clinical dataset containing 1032 patient records and 14 predictive laboratory and clinical variables, including blood pressure, hemoglobin, serum creatinine, potassium levels, and hypertension status.
Both approaches identified hemoglobin as one of the most significant predictors of CKD risk. Additional top-ranked features included blood urea, sodium levels, red blood cell count, potassium, and hypertension status, which are already known to be closely linked to renal dysfunction.
“Prior identification of all these mentioned important risk factors may subsequently help medical experts to make precise early identification of CKD,” detailed the researchers.
The research team then trained 8 ML classifiers using these selected risk factors, including support vector machines (SVM), random forest (RF), decision tree (DT), logistic regression, Naïve Bayes, K-nearest neighbors (K-NN), gradient boosting (GB), and an ensemble soft voting classifier.
Performance metrics demonstrated strong predictive capability across all models. Accuracy ranged from 91% to 98%, while precision, recall, and F1 scores consistently exceeded 90%. GB emerged as the best-performing model, achieving 98% accuracy, 99% recall, and an area under the curve (AUC) of 0.99, indicating high reliability in distinguishing between CKD and non-CKD cases.
“It is evident that gradient boosting outperformed the other seven models, achieving the highest accuracy, precision, recall, F1 score, and specificity,” wrote the researchers. “It also achieved low bias because of its property of building trees sequentially, where each new tree corrects the errors of the previous ones.”
Specificity analyses showed similarly promising results. GB and DT models achieved specificity rates of 96%, minimizing the likelihood of false-positive diagnoses. Notably, model performance declined when all raw features were used without prior feature selection or hyperparameter tuning. For example, K-NN accuracy dropped from 93% to 85%, underscoring the importance of identifying the most relevant clinical indicators before predictive modeling.
Several of the highest-ranked predictors, including low hemoglobin levels and reduced red blood cell count, were associated with anemia, a common complication of CKD resulting from impaired erythropoietin production by the kidneys. Hypertension and elevated blood urea levels were also highlighted as key contributors to disease progression.
By explicitly ranking these risk factors rather than using them solely to improve algorithm performance, the proposed methodology may offer clinicians actionable insights for early intervention, wrote the researchers.
References
1. Prima CNE, Juhola M. Early risk factor prediction in chronic kidney disease diagnosis using feature selection and machine learning algorithms. Methods Inf Med. Published online February 13, 2026. doi:10.1055/a-2797-4380
2. Saeed D, Reza T, Shahzad MW, et al. Navigating the crossroads: understanding the link between chronic kidney disease and cardiovascular disease. Cureus. 2023;15(12):e51362. doi:10.7759/cureus.51362




