
Novel Machine Learning Model Improves MASLD Detection in Type 2 Diabetes
Key Takeaways
- MASLD is prevalent in T2DM patients, with a 65% occurrence rate, and poses a higher risk for severe liver diseases.
- The study analyzed 3,836 T2DM patients, identifying key predictors like BMI, triglycerides, and HbA1c for MASLD.
A machine learning–driven web tool based on 13 standard patient metrics demonstrates strong predictive performance for MASLD, supporting early clinical intervention.
An interpretable machine learning (ML) model can accurately predict metabolic
MASLD is highly prevalent among adults with T2DM, affecting approximately 65%, and carries a substantially greater risk for progression to more severe liver diseases, including cirrhosis and hepatocellular carcinoma, compared with MASLD alone.2 In this single-center study, investigators analyzed electronic health records of 3,836 hospitalized patients with T2DM from January 2018 to May 2025, excluding those with other chronic liver diseases.1 MASLD was confirmed in 55.9% of participants using liver ultrasound in conjunction with established clinical criteria.
Eight machine learning algorithms were constructed and compared, including random forest, logistic regression, support vector machine, extreme gradient boosting (XGB), multilayer perceptron, k-nearest neighbors, naive Bayes, and light gradient boosting machine. From 62 baseline variables, 13 were ultimately selected via LASSO regression and multivariable analysis, including demographic, metabolic, and treatment-related factors. Key predictors included BMI, triglycerides, ALT, HbA1c, fasting C-peptide, HDL, albumin, sex, metformin use, and hyperlipidemia.
Shapley additive explanation (SHAP) interpretability analysis revealed that higher BMI, triglycerides, ALT, HbA1c, and fasting C-peptide levels increased the risk of MASLD, whereas higher HDL levels and regular medication use were associated with a reduced risk of MASLD. Conversely, being female, using metformin, and having comorbid hyperlipidemia were identified as factors implying an elevated risk. The researchers noted that female patients with T2DM appeared to have a higher risk, potentially attributable to metabolic changes associated with menopause in the study population, which had a mean age of 59 years. A U-shaped association was observed for albumin levels, with both relatively low and high levels being linked to an increased risk of MASLD.
Unsupervised cluster analysis further identified distinct metabolic phenotypes within the T2DM population, with risk ranging from 26.1% to 78.0%. High-risk clusters commonly exhibited elevated HbA1c, BMI, ALT, and triglycerides, as well as irregular medication adherence.
Among the models tested, the XGBoost model achieved the best overall performance, with an Area Under the Receiver Operating Characteristic (AUROC) of 0.873 and an Area Under the Precision-Recall Curve (AUPRC) of 0.904. The model achieved a recall of 0.819 and an F1 score of 0.809, also showed a stable balance between sensitivity and accuracy. All algorithms exceeded an AUROC value of 0.85; however, pairwise comparisons revealed that XGB significantly outperformed most others (P < .05), with stability confirmed by bootstrap analysis (P < .01). Compared with earlier risk models in diabetic cohorts, which reported AUROCs around 0.79 to 0.82, the present approach demonstrated superior accuracy and broader clinical applicability.
Importantly, the variables required are commonly available in standard inpatient and outpatient workflows. To facilitate real-world use, the authors developed a web-based risk calculator, accessible online at
“During long-term follow-up, the developed tool can also serve as a valuable resource for facilitating risk communication in patient education and health management, while enabling low-cost dynamic risk monitoring and validation of intervention efficacy,” the authors explain.
Current screening methods, such as ultrasound and transient elastography, require specialized equipment and trained personnel, and are challenging to implement widespread screening. Simpler, non-invasive tests are limited in sensitivity and practicality for large-scale screening in T2DM.3 “In this way, MASLD management may shift from reactive detection to proactive risk-stratified care, thereby improving outcomes while conserving healthcare resources,” the authors suggest.1
Study limitations include the reliance on ultrasound, incomplete capture of lifestyle factors and genetic susceptibility in the EHR dataset, and the lack of external validation across diverse cohorts. Still, the findings provide strong evidence that interpretable ML, particularly the XGB framework, can enhance MASLD prediction in high-risk T2DM populations.
References
- Zhou Z, Gao N, Liu J, Ma X, Ge Z, Ji C. An interpretable machine learning model for predicting metabolic dysfunction-associated steatotic liver disease in patients with type 2 diabetes. Diabetes Obes Metab. Published online September 29, 2025. doi:10.1111/dom.70168
- En Li Cho E, Ang CZ, Quek J, et al. Global prevalence of non-alcoholic fatty liver disease in type 2 diabetes mellitus: an updated systematic review and meta-analysis. Gut. 2023;72(11):2138-2148. doi:10.1136/gutjnl-2023-330110
- Boursier J, Canivet CM, Costentin C, et al. Impact of type 2 diabetes on the accuracy of noninvasive tests of liver fibrosis with resulting clinical implications. Clin Gastroenterol Hepatol. 2023;21(5):1243-1251.e12. doi:10.1016/j.cgh.2022.02.059
Newsletter
Stay ahead of policy, cost, and value—subscribe to AJMC for expert insights at the intersection of clinical care and health economics.