Google deep-learning system grades prostate cancer with 70% accuracy

A deep-learning system (DLS) designed by Google researchers outperformed several top general pathologists when grading prostate cancer on a standard grade, according to newly-published research.  

The system could potentially relieve some of the lack of speciality trained pathologists to meet the global demand for prostate cancer pathology and improve disagreements among pathologists on prostate cancer risks and treatments.

Google researchers recently published a paper exploring if deep learning could be used to improve the accuracy and objectivity of Gleason grading of prostate cancer in prostate biopsies. Gleason grading system is used to evaluate and classify cancer cells based on how closely they resemble normal prostate glands, according to researchers. However, Gleason grades for prostate cancer aren't always agreed upon by pathologists, contributing to several challenges with the grading system.

“The DLS was also more accurate than the average pathologist at Gleason pattern quantitation,” Martin Stumpe, technical lead, and Craig Mermel, product manager, healthcare, Google AI, wrote in a blog post. “These improvements in Gleason grading translated into better clinical risk stratification: the DLS better identified patients at higher risk for disease recurrence after surgery than the average general pathologist, potentially enabling doctors to use this information to better match patients to therapy.”

For the study, researchers developed and validated their deep-learning system by collecting de-identified images of prostatectomy samples. For the training data, 32 pathologists provided annotations of Gleason patterns, which resulted in more than 112 million annotated image patches. They also provided an overall Gleason grade group for each image.

Google’s deep-learning system achieved an overall accuracy of 70 percent, which outperformed U.S. board certified general pathologists used in the study. Pathologists achieved an average accuracy of 61 percent. Additionally, the system was more accurate than eight of the 10 high-performing individual general pathologists who graded each slide in the validation set.

The company described the initial results “encouraging,” while also stating more work needs to be done to refine the system’s accuracy.

“Further work will be needed to assess how to best integrate our DLS into the pathologist’s diagnostic workflow and the impact of such artificial-intelligence based assistance on the overall efficiency, accuracy, and prognostic ability of Gleason grading in clinical practice,” Stumpe and Mermel wrote. “Nonetheless, we are excited about the potential of technologies like this to significantly improve cancer diagnostics and patient care.”