AI boosts quality of breast cancer screenings when paired with human radiologists

AI algorithms can improve the overall quality of breast cancer screenings, according to a new study published in JAMA Network Open. The key is for those algorithms to be used in tandem with evaluations by human radiologists.

These findings are the product of a year-long international competition, the Digital Mammography (DM) Dialogue on Reverse Engineering Assessment and Methods (DREAM) challenge. The challenge involved more than 310,000 screening mammograms and was conducted by representatives from Sage Bionetworks, IBM Research, the Kaiser Permanente Washington Health Research Institute and the University of Washington School of Medicine.

“This DREAM Challenge allowed for a rigorous, apples-to-apples assessment of dozens of state-of-the-art deep learning algorithms in two independent datasets,” Justin Guinney, PhD, vice president of computational oncology at Sage Bionetworks and DREAM chair, said in a prepared statement. “This is a much-needed comparison effort given the importance and activity of AI research in this field.”

From September 2016 to November 2017, teams from 44 countries participated in the DM DREAM challenge, using more than 144,000 screening mammograms from the United States to train and validate their algorithms. A second dataset of more than 166,000 screening mammograms out of Sweden served as an independent validation cohort so that the teams could confirm their findings.

Sensitive patient information was protected through the model-to-data approach, which allows researchers to explore the effectiveness of their algorithms without putting anyone’s privacy at risk.  

“The concerns that patients feel about the use of medical images is always first in our minds,” co-author Diana Buist, PhD, MPH, Kaiser Permanente Washington Health Research Institute, said in the same statement. “The novel model-to-data approach for data sharing is particularly innovative and essential to preserving privacy, because it allows participants to contribute innovations which might actually improve the standard of care, without receiving access to the underlying data.”

Overall, the AI model with the strongest performance achieved an area under the ROC curve (AUC) of 0.858 and specificity of 66.2% with the U.S. dataset. Working with the dataset out of Sweden, the model’s AUC was 0.903 and specificity was 81.2%. When combined with assessments from actual radiologists, on the other hand, the model achieved an AUC of 0.942 and specificity of 92%.

“The DM DREAM challenge represents the largest objective deep learning benchmarking effort in screening mammography interpretation to date,” the study’s authors concluded. “An AI algorithm combined with the single-radiologist assessment was associated with a higher overall mammography interpretive accuracy in independent screening programs compared with a single-radiologist interpretation alone.”