Machine learning sees diseases obscured by info overload

Harvard researchers have demonstrated a way to cut through tangles of irrelevant information in electronic health records (EHRs) while applying machine learning to spot patterns indicative of specific disease markers.

The team, led by Hossein Estiri, PhD, of the Mass General Laboratory of Computer Science, details the work in a study posted in Patterns, an open-access journal published by Cell Press.

In their project overview, the authors point out that billions of dollars have been spent trying to wring value out of EHRs. Yet the systems remain too complex to mine for modeling diseases and outcomes without human involvement.

The approach developed by Estiri and colleagues combines a sequential pattern-mining algorithm with a machine learning pipeline. The combination “can be rapidly deployed to develop computational models for identifying and validating novel disease markers and advancing medical knowledge discovery,” they write.

In materials sent to the press by Mass General, the team describes as an example their system’s prediction of heart failure in patients who first had coronary artery disease and then chest pain. Both states were recorded in the EHR, and the experimental approach proved better at predicting heart failure than either of the factors on their own or in a different order.

“The computer sorts through thousands of patients and can find sequences that a physician would likely never identify on their own as relevant but actually are associated with the disease,” Estiri explains.

Mass General adds that the system might help identify patients at risk of developing any number of other diseases and then recommend evaluation by an appropriate specialty.