News|Articles|August 13, 2024

AI Improves Breast Cancer Detection but Requires Careful Monitoring

Artificial intelligence (AI) can aid in breast cancer detection, but radiologists need training to ensure accuracy and prevent missed cancers.

High breast cancer prevalence was found in a retrospective study performed at higher sensitivity and lower specificity than the original screen readers, but after adding information gathered through artificial intelligence (AI), it calibrated to perform similarly to radiologists in a screening setting that decreased sensitivity and increased specificity, according to findings published in European Radiology.¹

Early breast cancer detection through screening improves treatment outcomes, but mammograms have some limitations. Detection can be hindered by such factors as dense breast tissue, subtle cancer growth, and human error.²Mammogram screenings, while helpful, are not necessarily 100% accurate.³ Some limitations include false-positives, false-negatives, overdiagnosis and overtreatment, and radiation exposure.

As technology advances, so, too, has the use of AI for computer-aided detection, becoming an option to help alleviate the radiologist shortage and facilitate cancer detection.¹ Deep learning algorithms can outperform traditional feature-based approaches in digital mammogram analysis, and there are numerous AI-powered mammographic systems that are now commercially available.⁴

However, many radiologists have not begun to utilize AI to interpret mammograms because the billing codes for them to charge health plans for these processes do not yet exist.⁵ Usually, it is up to CMS to develop new billing codes and for private health plans to follow their payment plans, but they have yet to do so.

It is important that radiologists understand interpretations of AI information as it becomes increasingly involved in cancer detection and extracts image biomarkers for density assessment, radiological-pathological correlation, and predicting therapy responses.¹

Methods

The present study was conducted using a selection of screening mammograms, which encompassed the original radiologist assessment from 2010 to 2013 at the Karolinska University Hospital in Sweden.¹ The authors performed the full double reading and consensus discussion first without AI decision support and then with AI decision support. Participant age was categorized as either “younger” (40-55 years) or “older” (56-75 years).

Results

There were 758 women with mammogram exams included in the study, and 50% (n = 379) had some form of breast cancer. The younger population made up 52% of the entire data set overall and about 40% received a form of diagnosis. However, more than half of the women from 40 to 55 years old were considered healthy (64%). Of the 379 mammograms that detected cancer, 10% were in situ diagnoses, 42% had up to 15-mm invasive cancer, and 45% had invasive cancer that exceeded 15 mm.

Reader Flagging Rate

Breast exams with AI and without AI were reviewed by radiologists and flagged any exams with abnormalities in any of the 2 reads. Study radiologists calculated the breast exams flagged by any reader divided by all exams to receive the flagging rate. The recall rate was found by calculating exams with recall decision divided by all exams.

Without AI, there was a 34% flagging rate by radiologists in the nonenriched population-wide original setting of patients with breast cancer subtypes that are not human epidermal growth factor receptor 2 related. However, the enriched population had 59% of exams in the reader study flagged by radiologists without AI and 46% with AI support.

The flagging rate was 36% in the precalibrated nonenriched setting for standalone AI.

Sensitivity, Specificity After Consensus Discussion

Radiologists examined the sensitivity (the detection of positive cases) and specificity (the prediction of negative cases) of the AI system. Study sensitivity was 81% without AI and 75% with AI support. This resulted in a 7% relative decrease (P < .001). Additionally, the reader study sensitivity was lower (67%) without AI than the original setting specificity (98%). After AI decision support, the specificity increased to 86%, a relative increase of 28% (P < .001).

After applying AI, 39 cancers initially detected by radiologists were missing, while AI identified 16 previously undetected cancers. This resulted in a net decrease of 23 cancers when using AI decision support.

Tendency to Change Assessment

The likelihood of radiologists changing their assessments based on AI information varied by individual and image characteristics. While some radiologists were more likely to change from negative to positive assessments, others were more likely to change from positive to negative.

It became more common to change from a positive to a negative result based on AI information for exams with more indirect signs of possible malignancy, such as architectural distortion and asymmetrical density, compared with pronounced signs like microcalcifications.

Radiologists were less likely to change their assessment from positive to negative after reviewing AI information when the Bi-RADS score was higher and multiple image signs of potential malignancy were present.

Cancer Characteristics

For cancers larger than 15 mm, the study detected 134 invasive cancers without AI and 128 with AI. For cancers smaller than or equal to 15 mm, the numbers of invasive cancers were 134 and 119, respectively. The AI model demonstrated greater accuracy in detecting smaller, rather than larger, invasive cancers.

Reading Time

Exam reading time decreased by 38% with AI assistance, from an average of 21 seconds to 13 seconds.

Limitations

The reliability of the study was limited because the investigators used different radiologists for initial mammograms and the reader study. Additionally, the high enrichment ratio may not reflect typical clinical practice. Also, when AI is integrated into screening, radiologists' performance may fluctuate based on perceived cancer prevalence in triaged cases. Consequently, ongoing monitoring of radiologist performance is crucial for successful AI implementation.

“These interaction effects may not be possible to estimate beforehand, calling for careful monitoring of radiologist performance, overall and individually, in real-world implementations of AI for screening mammography,” the study authors concluded.

References

1. Al-Bazzaz H, Janicijevic M, Strand F. Reader bias in breast cancer screening related to cancer prevalence and artificial intelligence decision support—a reader study. Eur Radiol. 2024;34(8):5415-5424. doi:10.1007/s00330-023-10514-5

2. Lång K, Hofvind S, Rodríguez-Ruiz A, et al. Can artificial intelligence reduce the interval cancer rate in mammography screening? Eur Radiol. 2021;31(8):5940-5947. doi:10.1007/s00330-021-07686-3

3. Limitations of mammograms. American Cancer Society. January 14, 2022. Accessed August 13, 2024. https://www.cancer.org/cancer/types/breast-cancer/screening-tests-and-early-detection/mammograms/limitations-of-mammograms.html

4. Yoon JH, Strand F, Baltzer P, et al. Standalone AI for breast cancer detection at screening digital mammography and digital breast tomosynthesis: a systematic review and meta-analysis. 2023;307(5):1-10. doi:10.1148/radiol.222639

5. Andrews M. Mammography AI can cost patients extra. Is it worth it? CBS News. January 9, 2024. Accessed August 12, 2024. https://www.cbsnews.com/news/mammogram-ai-cost-patients-is-it-worth-it/

Stay ahead of policy, cost, and value—subscribe to AJMC for expert insights at the intersection of clinical care and health economics.

Subscribe Now!