
Predicting Infection Risk in CLL Using Domain Adaptation
Key Takeaways
- Domain-adapted models trained on CLL plus lymphoma data outperformed CLL-only models and CLL-IPI, with best random forest Matthews correlation coefficient 0.342 versus 0.290 and 0.166.
- Severe infection was operationalized as IV antimicrobials with a blood culture within two days, occurring within one year after treatment initiation.
Domain Adaptation leverages lymphoma EHRs to better predict infection risk from treatment-related immune suppression in CLL, boosting Matthews correlation coefficient.
Domain adaptation may be a novel creative solution to predict infection risk in patients with
The results of a new analysis published in
Researchers at Copenhagen University Hospital drew on the DALY-CARE data resource, a Danish registry linking electronic health records from patients with
The investigators defined severe infection as receiving intravenous antimicrobials and a concurrent blood culture draw, within 2 days of one another, within 1 year of treatment initiation. Three prediction strategies were compared: a domain-specific model (CLL data only), a domain adaptation model (trained on both CLL and lymphoma data, tested on CLL only), and the standard CLL International Prognostic Index (CLL-IPI). Logistic regression, random forest, and gradient boosting were the classifier types tested across the 3 strategies.
Key Findings
The domain adaptation approach outperformed the others across every model tested. Using the best-performing classifier (random forest), the domain adaptation model achieved a Matthews correlation coefficient of 0.342 vs 0.290 for the CLL-only model and 0.166 for the CLL-IPI score alone. In terms of clinical stratification, domain adaptation also separated high-risk patients from low-risk patients with an OR of 4.43 compared with 3.69 for the domain-specific model and 2.27 for CLL-IPI.
Domain adaptation also relied heavily on medication-derived features, including antibacterial drugs, alimentary tract medications, and nontherapeutic products, while the CLL-only model leaned more on biochemistry results like albumin and C-reactive protein (CRP). Elevated CRP was identified as a predictive feature by both approaches, consistent with its role as a marker of infection or systemic inflammation.4
Predictive accuracy also was notably stronger for patients with a prior hospitalization history, since those records provide richer data on comorbidities and frailty, the authors explained. For patients without hospitalization history, all models performed worse.
Real-World Implications
The findings carry 2 important messages for oncology and health informatics. First, when disease-specific data are scarce, borrowing from a biologically adjacent condition can meaningfully improve predictive performance. CLL and lymphoma are distinct, but they share enough clinical overlap (eg, similar treatment pathways, overlapping comorbidity profiles, comparable infection dynamics) that lymphoma data have potential to add a genuine signal.
Second, domain adaptation highlights that infection risk in CLL is embedded in the broader clinical picture before treatment begins, as captured in medication lists, lab values, and hospitalization patterns rather than genomic variables alone. This suggests that rich electronic health record data, already routinely collected, could be harnessed to flag high-risk patients before treatment decisions are made.
The authors underscore that their present results echo those from a previous analysis, such as the domain adaptation and specific strategies had better predictive performance among recently hospitalized patients.
“From a modeling perspective,” they conclude, “employing [domain adaptation] techniques that learn the shifts in data across diseases, along with the use of deep learning models, may be a promising approach to explore.”
The authors note that future work incorporating data from
Their study has limitations. Their patient cohort was small, and the domain adaptation approach was tested only with lymphoma as the source domain. Also, the model performed poorly for patients without hospitalization histories, leaving a gap for healthier, less data-rich patients.
References
- Ruppert AS, Booth AM, Ding W, et al. Adverse event burden in older patients with CLL receiving bendamustine plus rituximab or ibrutinib regimens: Alliance A041202. Leukemia. 2021;35(10):2854-2861. doi:10.1038/ S41375-021-01342-X
- Goyal RK, Nagar SP, Kabadi SM, Le H, Davis KL, Kaye JA. Overall survival, adverse events, and economic burden in patients with chronic lymphocytic leukemia receiving systemic therapy: real‐world evidence from the Medicare population. Cancer Med. 2021;10(8):2690-2702. doi:10.1002/CAM4.3855
- Parviz M, Brieghel C, Werling M, et al. Post-treatment infection prediction in CLL using domain adaptation of lymphoma electronic health records. Acta Oncol. 2026:65:109-118. doi:10.2340/1651-226X.2026.44569
- Levinson T, Wasserman A. C-reactive protein velocity (CRPv) as a new biomarker for the early detection of acute infection/inflammation. Int J Mol Sci. 2022;23(15):1-10. doi:10.3390/IJMS23158100




