WHO, ITU establish benchmarking process for AI in medicine

Two United Nations agencies have joined forces to create the Focus Group on Artificial Intelligence for Health (FG-AI4H)—a group of global representatives the UN hopes will help shape a streamlined, transparent process for vetting AI technologies in the healthcare space.

FG-AI4H, a product of the World Health Organization and International Telecommunication Union (ITU), was first established last July as a reaction to the rapidly evolving and expanding field of AI in medicine, focus group members wrote in a Lancet article published March 29. Though the sector is currently dealing with its fair share of hurdles—physician shortages, sky-high burnout rates and growing life expectancies, to mention a few—we have more access than ever to mass amounts of digital data.

“AI models that learn from these large datasets are in development and have the potential to assist with pattern recognition and classification problems in medicine—for example, early detection, diagnosis and medical decision-making,” Thomas Wiegand et al. wrote in the journal. “These advances promise to improve healthcare for patients and provide much-needed support for medical practitioners.”

FG-AI4H has held three workshops and meetings to date and is in the process of recruiting health and policy experts to form topic groups. Topic use cases will make up the focus group’s basic building blocks, each representing a major, relevant health problem that’s difficult or costly to solve. Once topics are reviewed and agreed upon, Wiegand and colleagues said they’ll act as “a forum for open collaboration among stakeholders who agree on a pragmatic, best-practice approach for benchmarking each use case.”

The authors said topic outlines will ideally define each application scenario and the desired output of AI models in that use case, as well as identify adequate sources of training and testing data. It’s also important to make multisource, heterogeneous data accessible to scientists training AI models, since the optimal model would be generalizable across multiple populations.

“For many use cases it would, at least initially, be meaningful to compare model performance against human performance, or human performance with AI assistance in the same task, whereas for other tasks, comparative performance of algorithms would be more meaningful,” Wiegand and co-authors said. “Once these requirements are met, AI models can be submitted via an online platform to be evaluated with the test data.

“Established this way, the benchmarking process will not only provide a reliable, robust and independent evaluation system that can demonstrate the quality of AI models, but will also provide an independent test dataset for model validation consistent with best-practice recommendations for reporting multivariable prediction models in health.”

So far, the focus group has developed 11 topic groups in areas including cardiovascular disease risk prediction, ophthalmology and AI-based symptom checkers.

FG-AI4H is encouraging academics, technologists and regulatory communities to contribute to the process by sharing their ideas about topics, use cases, data and algorithms. The focus group is slated to meet again April 2 in Shanghai, with meetings in Geneva, Tanzania and India planned for later this year.