Article
Author(s):
In this new study from Norway, artificial intelligence (AI) was used with mammography screenings to predict risk of breast cancer in women vs double reading of mammographic screening by independent radiologists.
Fewer than 20% of screen-detected breast cancers were missed by a clinical evaluation that paired artificial intelligence (AI) and mammography screenings, according to new study findings published online today in Radiology.
According to the study’s investigative team, this could indicate that AI has high value as a new diagnostic tool in the breast cancer space. They evaluated a commercially available AI system against independent double reading of mammography results by 2 radiologists.
Their retrospective analysis utilized 122,969 screening mammograms performed at 4 BreastScreen Norway screening units, from October 2009 through December 2018, with the initial patient cohort including 47,877 women (mean [SD] age, 60 [6] years). The final analysis comprised 752 screen-detected cancers and 205 interval cancers, or 6.1 and 1.7 cases, respectively, per 1000 exams.
“Mammograms acquired through population-based breast cancer screening programs produce a significant workload for radiologists. AI has been proposed as an automated second reader for mammograms that could help reduce this workload,” noted a statement on the findings. “The technology has shown encouraging results for cancer detection, but evidence related to its use in real screening settings is limited.”
AI prediction scores ranged from 1 (low breast cancer risk) to 10 (high breast cancer risk), and the AI system also was evaluated using 3 thresholds:
Approximately 78.0% of all of the cancers in this study were scored a 10 by the AI system, with that number jumping to 86.8% of screening-detected cancers compared with 44.9% of interval cancers. The median (IQR) tumor diameter was 9 (9-18) mm.
A total 86.8% of the screening-detected cancers and 44.9% of the interval cancers in the present study were scored a 10 by the AI system (threshold 1), indicating a high risk of breast cancer. For threshold 1, this equated to a raw score above 9 and indicated a suspicious finding by the AI system; raw scores were carried out to 4 decimal points before being rounded up. The median tumor diameter was 13 (9-19) mm of the screening-detected cancers compared with 10 (7-17) mm among cancers not selected.
Threshold 2’s raw score was above 9.13, and this was used as evaluative criteria in instances where total suspicious screenings of the AI system closely matched the radiologists’ findings. When this threshold was utilized, the AI system detected 85.1% of screening-detected cancers and 41.5% of interval cancers. Either or both of the radiologists positively interpreted less than half of the screen-detected cancers (42.9%).
The results were similar using threshold 3, which was “set to yield a selection rate similar to an average individual radiologist (5.8%),” the authors noted. Again, more screening-detected cancers than interval cancers were selected by the AI system: 80.1% vs 30.7%. For threshold 3, the raw score had to be above 9.43. Larger median tumors were detected by the AI system vs the cancers it did not select: 13 (9-20) vs 9 (7-15).
“To our knowledge, this is the largest AI evaluation study to date, including more than 120 000 examinations from a real screening setting,” the authors wrote. “However, more research is needed to find the optimal combination of radiologists and AI systems.”
According to the authors, areas to evaluate further are optimal settings for the timing and format of AI scores, how rates of recall and false-positive results can be influenced by negative examinations, mammographic features identified by AI, multiple AI algorithms in a comparative manner, use of AI in more diverse populations, and the cost-effectiveness of AI.
Reference
Larsen M, Aglen CF, Lee CI, et al. Artificial intelligence evaluation of 122969 mammography examinations from a population-based screening program. Radiology. Published online March 29, 2022. doi:10.1148/radiol.212381