Machine learning charts disease symptoms from patient-physician conversations

A machine learning model developed by scientists at Google successfully documented and charted disease symptoms from patient-physician conversations in early tests, but the tech still has a long way to go, according to research published in JAMA Internal Medicine March 25.

“Automating clerical aspects of medical record-keeping through speech recognition during a patient’s visit could allow physicians to dedicate more time directly with patients,” first author Alvin Rajkomar, MD, a senior research scientist at Google and an assistant professor at the University of California, San Francisco, et al. wrote in the journal. “We considered the feasibility of using machine learning to automatically populate a review of systems (ROS) of all symptoms discussed in an encounter.”

Another opportunity to provide more hands-on care and focus on patients, the team’s previously developed recurrent neural network is able to discern between relevant and irrelevant symptoms as they pertain to a patient’s condition. The researchers randomly selected 2,547 medical encounter transcripts from a pool of 90,000 previously hand-transcribed encounters, 2,091 of which were used to train the model and 456 of which were used to test the model. The remaining transcripts were used for unsupervised training.

Human scribes labeled the more than 2,500 transcripts with 185 symptoms, assigning each symptom mention a relevance to the ROS as it related to a patient’s experience. Input to the machine learning model was a sliding window of five conversation turns, or snippets, and output was each symptom mentioned, its relevance to the patient, and whether the patient experienced that symptom.

In the test set of 2,091, the team reported 5,970 symptom mentions, 79.3 percent of which were relevant to the ROS and 74.2 percent of which were experienced by patients. Across the full test set, the model achieved a sensitivity of 67.7 percent and a positive predictive value of a predicted symptom of 80.6 percent.

Broken down further, the sensitivity of the model was 67.8 percent for unclear symptoms, the authors said, and 92.2 percent for clearly mentioned symptoms. A symptom was considered “clearly mentioned” if two randomly selected scribes both independently assessed the likelihood of including any given symptom in the ROS as “extremely likely.”

“The model would accurately document—meaning correct identification of a symptom, correct classification of relevance to the note and assignment of experienced or not—in 87.9 percent of symptoms mentioned clearly and 60 percent in ones mentioned unclearly,” Rajkomar and colleagues wrote. “By going through the process of adapting such technology to a simple ROS autocharting task, we report a key challenge not previously considered: a substantial proportion of symptoms are mentioned vaguely, such that even human scribes do not agree how to document them.”

The authors said the model performed well on clearly mentioned symptoms, which is encouraging, but it’s far from being perfect.

“Solving this problem will require precise, though not jargon-heavy, communication,” they wrote. “Further research will be needed to assist clinicians with more meaningful tasks such as documenting the history of present illness.”