Researchers have demonstrated two machine-learning techniques that, when combined to analyze social-media posts, can boost early detection of clinical depression by 10% over the current state of the art.
Members of the team, all faculty at the University of A Coruña in Spain, had their work published online June 14 in the Journal of Medical Internet Research.
Senior author Victor Carneiro, PhD, and colleagues identified 887 individuals whose language on a depression subreddit at Reddit.com suggested possible major depressive disorder.
Of these, 135 indicated they’d indeed been diagnosed with the condition. These self-reported diagnoses served as the definitive outcome against which the researchers compared the performance of their automated tools.
Characterizing the challenge of diagnosing depression via social media as a classification problem, the researchers developed two algorithms to comb through more than 500,000 posts and comments for textual, semantic and writing features.
They initially tried a model based on a single binary classifier and two threshold functions—one positive and another negative. This yielded mediocre results and proved onerously time-consuming, as the classifier needed copious evidence to confirm or reject the presence of depression.
The better-performing model used a dual approach, running one algorithm to detect depression and another to rule it out.
“Interestingly, writing features become crucial for the positive model (in charge of detecting depression cases), along with semantic similarity and textual similarity, although limited to the post text field,” the authors commented. “On the contrary, the negative model (predicting nondepression cases) can follow a much simpler approach based on semantic or textual similarity.”
The study’s control group consisted of a large, randomly selected subset of “Redditors,” including some who self-reported no depression diagnosis yet were active on the depression subreddit.
“In comparison with [current] state-of-the-art detection models, our results showed how the dual model is able to improve performance up to more than 10%,” Carneiro et al. concluded. “We consider that these results can help in the development of new tools to identify at-risk individuals, enabling those people suffering from depression to be detected and receive treatment as soon as possible.”
The authors plan to study different model combinations for their dual approach “with an intense focus on new machine learning algorithms and feature sets.”