AI continues to be one of the hottest topics in all of healthcare, especially radiology, and more academic researchers are exploring the subject than ever before. So what separates a good AI study from a bad one? That’s exactly what the editorial board of RSNA’s Radiology journal hoped to cover with its new commentary.
The team noted that specific guidelines will likely be developed in the near future that focus on AI research related to diagnostic imaging. In the meantime, however, the editorial board wanted to share a guide to help researchers ensure they are on the path to success.
The board provided a list of several issues would-be authors must keep in mind while developing their research. These are five of the most important considerations from that list:
1. Use an external test set for your final assessment:
AI models often produce impressive findings … until they are paired with outside data and unexpected bias or inconsistencies are revealed. Researchers should test their algorithms on outside images—as in, from a completely different institution—to show that their study has real potential to make an impact on patient care.
2. Use images from a variety of vendors:
For an algorithm to be clinically useful, it has to work with imaging equipment manufactured by a wide variety of vendors.
“Radiologists are aware that MRI scans from one vendor do not look like those from another vendor,” wrote David A. Bluemke, MD, PhD, editor of Radiology and a radiologist at the University of Wisconsin Madison School of Medicine and Public Health, and colleagues. “Such differences are detected by radiomics and AI algorithms. Vendor-specific algorithms are of much less interest than multivendor AI algorithms.”
3. Train your algorithm with a widely accepted reference standard:
If researchers don’t turn to a standard of reference that the industry already trusts, it will be hard to have interested parties take the research seriously. For example, Bluemke et al. noted that the Radiology editorial board does not consider clinical reports to be a good enough standard of reference for any radiology research.
“Given the frequent requirement of AI for massive training sets (thousands of cases), the research team may find the use of clinical reports to be unavoidable,” the authors wrote. “In that scenario, the research team should assess methods to mitigate the known lower quality of the clinical report when compared with dedicated research interpretations.”
4. Compare your algorithm’s performance to experienced radiologists:
It’s much more important to see how AI models compare to experienced radiologist readers than nonradiologist readers or other algorithms. Researchers may want to compare their work to radiology trainees or nonradiologists to provide a certain level of context, the authors added, but this shouldn’t be used as an evaluation of the algorithm’s “peak performance.”
5. Make your algorithm available to the public:
Think your algorithm could make a real impact? Let other specialists try it out for themselves.
“Just like MRI or CT scanners, AI algorithms need independent validation,” the authors wrote. “Commercial AI products may work in the computer laboratory but have poor function in the reading room. ‘Trust but verify’ is essential for AI that may ultimately be used to help prescribe therapy for our patients.”