Can social media’s ‘digital breadcrumbs’ help surveil COVID-19?

When Windy City residents send out tweets containing the term “food poisoning,” a machine learning algorithm offers the sender a form for sharing details with Chicago’s public-health officials. The system helps identify problem establishments in need of examining.

Why not do the same—across not just social media but also blogs, chatrooms and local online news—with keywords like “cough,” “fever” and “trouble breathing” to help track COVID-19?

Knowable magazine asks and answers the question from various viewpoints in an article posted March 27.

En route to fleshing out the pros and cons, Pulitzer-winning reporter Katherine Ellison quotes Harvard’s John Brownstein, PhD, chief innovation officer at Boston Children’s.

“There’s incredible amounts of data,” Brownstein says, “that give us clues about disease outbreaks happening on a daily basis.”

Brownstein, who co-founded the influential and widely consulted HealthMap back in 2006, suggests these “digital breadcrumbs” of data could fuel the emerging field of digital epidemiology.

Ellison notes that social-media posts are only a small part of the data sources feeding HealthMap, which launched Covid Near You in late March. Still, Brownstein and other experts tell Knowable, the medium’s speed and volume hold the promise of helping health officials “spot outbreaks quickly and cheaply.”

But before the field blossoms, a number of dry spots will need to be greened.    

“It’s actually really hard to get useful prospective data from social media,” says Northeastern University computer scientist Clark Freifeld, PhD.

One of the tough challenges is that, after a story breaks, most posts are reactions to that news; the posts tend not to add anything new.

For other problems, AI may help build a bridge from the possible to the doable.

For example, to check the problem of red herrings—Google searches for “cholera,” for example, spiked following Oprah Winfrey’s recommendation of the novel Love in the Time of Cholera—HealthMap uses machine learning to filter out repetition and irrelevancies.

“We have a database of millions of articles and pieces of content relating to disease outbreaks,” says Freifeld, also a co-founder of HealthMap. “We’ll hand-label, say, 100,000 examples of actual outbreaks and contrast them with things that aren’t related, like an ‘outbreak’ of home runs in the seventh inning. That’s how the system learns what’s useful and what’s not.”

Ellison also makes note of Kinsa, a San Francisco startup that collects real-time health data and has distributed more than 1 million body-temp thermometers connected to the internet.

Collaborating on research with the company is Oregon State University scientist Benjamin Dalziel, PhD, who says the system can track the flu two weeks faster than the CDC and may do the same for COVID-19.

“This is the future, however grand that sounds,” Dalziel tells Knowable. “[W]hile I think there has been stunning work done to extract information from Twitter, a thermometer reading has clearly got an advantage over a tweet.”

Click here to read the whole thing.