Federated Learning Models Show Potential in Melanoma-Nevus Classification, but Improvements Needed


A multicentric, single-arm diagnostic study created a decentralized federated learning model for the classification of invasive melanomas and nevi, showcasing comparable results to centralized data models.

A federated learning (FL) model demonstrated great promise in the binary classification of nevi and invasive melanomas while showcasing the benefits that artificial intelligence (AI) can provide regarding privacy, global collaboration, and image classification in melanoma diagnostics, according to a recent study published in JAMA Dermatology.

The earlier that melanoma can be detected, the better outcomes a patient typically experiences; however, this detection isn’t without its challenges. Atypical nevi carry a degree of morphological overlap with melanoma that can affect the cancer’s identification. Prior research, the present authors add, has demonstrated the potential use—if not superiority—of convolutional neural networks (artificial programs engineered for image/pattern recognition) to successfully perform histopathological and dermatological aims compared with human specialists.

AI Learning Concept | Image credit: putilov_denis -

AI Learning Concept | Image credit: putilov_denis -

Implementing AI into the clinical setting comes with challenges of its own. AI models rely on diverse data to learn and perform to their potential. The authors of the current study detail how, to improve an AI’s algorithm in a medical setting, patient data would ideally be stored centrally and transferred to external sites for development—which raises a plethora of concerns regarding the privacy of patient information. To avoid this, facilities could utilize their own resources to develop an algorithm with their data; however, institutions may lack the computational capacity to achieve this in a meaningful way. These conflicts present major challenges for both data collection and collaboration among studies, institutions, and researchers. If done correctly, this sharing could provide a myriad of benefits to medical knowledge and the medical field.

FL is an emerging approach that offers some solutions. This model is designed for the decentralization of data and, furthermore, requires less computer power. This method enables institutions to use their own data to train their own models.

The current authors saw an opportunity to explore the functionality and effectiveness of FL. At present, FL has be leveraged with retrospective melanoma dats; no study has investigated FL in the prospective gathering of melanoma data. Therefore, the authors developed a decentralized FL model (training set) for the purposes of binarily classifying invasive melanomas (IMs) and nevi using histophathological whole-slide images (WSIs).

Data were prospectively gathered from 6 German universities between April 2021 and February 2023, and the WSIs were compared against slides that were retrospectively gathered by centralized and ensemble learning models in an external and holdout dataset.

In total, 923 patients and 1025 slides were included. These slides consisted of 388 IMs and 637 nevi. The median age of diagnosis was 58 years in the training set, 61 years in the external test dataset, and 57 years in the holdout test dataset. Median Breslow thickness across these data sets was 70 mm, 80 mm, and 70 mm, respectfully.

In the holdout test dataset, the FL model had the worst performance, with an average area under the receiver operating characteristic curve (AUROC) of 0.8579 (95% CI, 0.7693-0.9299). The ensemble approach performed slightly better (AUROC, 0.8867; 95% CI, 0.8103-0.9481), and the centralized, classical model gave the best performance (AUROC, 0.9024; 95%CI, 0.8379-0.9565). The difference in performance between the centralized model compared with FL was statistically significant (P < .001).

As for the external test dataset, the centralized model gave the poorest performance (AUROC, 0.9045; 95% CI, 0.8701-0.9331), followed by FL (AUROC, 0.9126; 95% CI 0.8810-0.9412), and then the top-performing ensemble approach (AUROC, 0.9227; 95% CI, 0.8941-0.9479). Here, the FL was a significantly better performer than the centralized model (P < .001).

“While the observed differences between FL and the centralized approach may not be large in absolute terms, they are consistent over 1000 iterations of bootstrapping . . . thereby demonstrating a sustained outperformance of the centralized approach,” the authors wrote.

Despite this observed superiority, the FL model did produce comparable results to the centralized or ensemble approaches, which the authors belief showcases its potential as a reliable alternative for classifying IMs and nevi.

“Additionally, FL empowers institutions to contribute to the development of AI models, even with relatively small datasets or strict data protection rules, thereby fostering collaboration across institutions and countries,” they added.

As research with FL expands to determine its efficacy and improve on its performance, the authors concluded by encouraging future studies to expand FL utilization into other classification tasks with differing medical images.


Haggenmüller S, Schmitt M, Krieghoff-Henning E, et al. Federated learning for decentralized artificial intelligence in melanoma diagnostics. JAMA Dermatol. Published online February 7, 2024. doi:10.1001/jamadermatol.2023.5550

Related Videos
RAvin Ratan, MD, MEd, MD Anderson
Amy Shapiro, MD
Amit Singal, MD
Ali Khawar
Binod Dhakal, MD
Dr Migvis Monduy
Amy Shapiro, MD
Plasminogen is vital in the body's coagulation process and breaking down clots | image credit: -
ISPOR 2024 Recap
Chris Pagnani, MD, PC
Related Content
CH LogoCenter for Biosimilars Logo