Advanced AI models improve data extraction from free-text pathology reports

Researchers have developed two AI-powered tools for automatically extracting key information from free-text pathology reports. The team, from the government-funded Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee, shared its findings in the Journal of the American Medical Informatics Association.

The ORNL is one of the U.S. Department of Energy’s most important research laboratories, and its scientists are constantly working to find new ways to improve patient outcomes using advanced technologies such as AI and natural language processing.

“Population-level cancer surveillance is critical for monitoring the effectiveness of public health initiatives aimed at preventing, detecting, and treating cancer,” corresponding author Gina Tourassi, director of the Health Data Sciences Institute and the National Center for Computational Sciences at the ORNL, said in a prepared statement. “Collaborating with the National Cancer Institute, my team is developing advanced AI solutions to modernize the national cancer surveillance program by automating the time-consuming data capture effort and providing near real-time cancer reporting.”

For this study, lead author Mohammed Alawad and colleagues trained multitask convolutional neural networks (MTCNNs) to extract cancer-related data from free-text pathology reports. The MTCNNs—one “hard parameter sharing” model and one “cross-stitch” model—performed five separate extraction tasks. Their performances were compared with single-task CNNs and a selection of machine learning techniques.

Overall, the MTCNNs outperformed all other AI models. Based on retrospective analysis, the hard parameter model (59.04%) and cross-stitch model (57.93%) correctly classified a higher percentage of pathology reports than the other models, which ranged from 36.75% to 53.68%. A prospective analysis of the two MTCNNs also resulted in a superior performance (60.11% for the hard parameter model, 58.13% for the cross-stitch model) compared to the other models.

So what’s next for these researchers?

“The next step is to launch a large-scale user study where the technology will be deployed across cancer registries to identify the most effective ways of integration in the registries’ workflows,” Tourassi said in the same ORNL statement. “The goal is not to replace the human but rather augment the human.”