Stepping outside the comfort zone of their normal daily focus—materials science—an unlikely team at Berkeley Lab has created a text-mining tool that may help scientists of all disciplines tame and tap the presently blinding blizzard of COVID-19 papers.
The NLP-powered tool scans and searches these papers, quickly highlighting connections it might take a human many hours to uncover.
The team is calling the tool COVIDScholar.
In an item published April 28 by the news division of the lab—more formally the Lawrence Berkeley National Laboratory—the team says its innovation arose out of a call to action issued by the White House in March.
Specifically, the executive branch’s Office of Science and Technology Policy asked AI experts for help developing new text- and data-mining techniques that might answer pressing questions about COVID-19.
Just a few weeks later, here’s materials scientist and engineer Gerbrand Ceder, PhD:
“On Google and other search engines people search for what they think is relevant. Our objective is to do information extraction so that people can find nonobvious information and relationships. That’s the whole idea of machine learning and natural language processing that will be applied on these datasets.”
To this project co-leader Kristin Persson, PhD, adds that every field of scientific inquiry produces mountains of scholarly material. However, the scholarly verbosity accumulating in the COVID-19 storm is especially daunting.
“There’s no doubt we can’t keep up with the literature, as scientists,” Persson says. “We need help to find the relevant papers quickly and to build correlations between papers that may not, on the surface, look like they’re talking about the same thing.”
A third source quoted in Berkeley’s own coverage, graduate student John Dagdelen, says the accessibility of the millions of articles in the Google Scholar database is powerful in its own right.
“However, when you search for ‘spleen’ or ‘spleen damage’—and there’s research coming out now that the spleen may be attacked by the virus—you’ll get 100,000 papers on spleens, but they’re not really relevant to what you need for COVID-19. We have the largest single-topic literature collection on COVID-19.”
Click here for the rest of the story from Berkeley Lab.