RA2-DREAM Challenge Demonstrates Successful Crowdsourcing Approach in Developing Joint Damage Algorithms

Machine learning models collected using a crowdsourcing approach that provide feasible, quick, and accurate methods to quantify joint damage in rheumatoid arthritis (RA) could potentially be incorporated into electronic health records.

The Rheumatoid Arthritis 2–Dialogue for Reverse Engineering Assessment and Methods (RA2-DREAM Challenge) used a crowdsourcing approach to successfully develop machine learning models that provide feasible, quick, and accurate methods of quantifying joint damage in RA.

"These findings suggest that after refining and validating with larger cohorts, these algorithms alone or in combination could be incorporated into electronic health records, contributing to more informed and precise management of RA," the study authors said.

The study was published in JAMA Network Open.

The RA2-DREAM Challenge used 674 radiographic sets or X-rays of the hands, wrists, and feet, as well as expert-curated Sharp-van der Heijde (SvH) scores from 2 clinical studies for training (367 sets), leaderboard (119 sets), and final evaluation (188 sets).

The Challenge included 3 subchallenges, tasking participants with developing methods to automatically quantify overall damage (subchallenge 1), joint space narrowing (subchallenge 2), and erosions (subchallenge 3).

A total of 173 submissions from 26 participants or teams from 7 countries were entered, and 13 submissions were included in the final evaluation. These submissions came from biomedical, computer science, and engineering experts.

Each model’s performance and reproductivity was evaluated by comparing each submission’s scores of each joint to ground truth SvH scores using a patient-weighted root mean square error (RMSE) approach.

According to the authors, the weighted RMSE assessments showed the winning algorithms produced scores very close to the expert-curated SvH scores.

“Although there is complexity and variability inherent in the images from patients, the top-performing algorithms achieved relatively high accuracy and were reproducible,” they said.

Two major observations that lined up with the author’s expectations were also made.

First, the scoring of the metacarpophalangeal and proximal interphalangeal joints in the hands and forefoot was more accurate than the scoring of those in the wrist.

“This can likely be explained by the anatomic complexity of the articulations of the 8 carpal bones,” the authors noted, adding, “posteroanterior images lead to difficulty in visualizing all components of the joints.”

Additionally, scores for joint space narrowing were more concordant with SvH scores compared with scores for joint erosion. This may be due to joint space narrowing being a more direct measurement of distance, while measuring joint erosion depends on bone morphological characteristics and bone disruption.

Most of the submitted methods used deep learning-based approaches, which reflects a trend in image analysis research and the replicability of pretrained models such as DenseNet, ResNet, and U-Net.

Models that have been refined and deployed, optimized, and validated in real-world studies may eventually be adopted.

“The findings of this prognostic study of the RA2-DREAM Challenge suggest that international, award-incentivized, and crowdsourced collaboration could create robust and reproducible algorithms to interpret radiographic images of bones and joints,” the authors concluded. “Such algorithms have great potential for improving outcomes in patients with RA and other chronic forms of arthritis.”

Reference

Sun D, Nguyen TM, Allaway RJ, et al. A crowdsourcing approach to develop machine learning models to quantify radiographic joint damage in rheumatoid arthritis. JAMA Netw Open. 2022;5(8):e2227423. doi:10.1001/jamanetworkopen.2022.27423