How MIT researchers used AI to improve drug approvals

Machine learning is being quickly adapted across the healthcare space to develop precision medicine, and it can also be leveraged to improve the development of new drug treatments and devices by improving the randomized clinical trial process, according to MIT researchers.

The researchers needed to enhance data on clinical trial outcomes to better predict if drugs were likely to be approved, using machine learning and statistical techniques. They used the largest set of data to date from two proprietary pharma pipeline databases. The findings were published in the debut issue of the Harvard Data Science Review.

Limiting the risks of clinical trials can allow resources to be used more efficiently, with fewer failures, faster drug approval times, lower cost of capital and more funding for developing other new therapies.

“Everyone is affected by the risk of a drug failing in its clinical trial process,” lead study author Andrew Lo, director of the MIT Laboratory for Financial Engineering, said in a statement. “With more accurate measures of the risk of drug and device development, we hope to encourage greater investment at this unique inflection point in biomedicine.”

Beyond offering guidance to investors, scientists, clinicians and biopharma professionals on the potential success of a drug trial, machine learning-enhanced science can also benefit policymakers.

“Policymakers and regulators would also benefit from machine-learning predictions, particularly for drug-indication pairs that are predicted to fail with high likelihood––these cases highlight the most difficult challenges in biomedicine and underscore the need for greater government and philanthropic support,” Lo et all wrote.

Lo and his team used machine learning and statistical techniques to account for missing data, estimating missing values and other model parameters for more accurate forecasting. Datasets are frequently missing data because of desires to protect trade secrets and because there is simply no incentive for investigators to provide additional data, the authors wrote. While all historical drug development datasets are missing data, most studies do not report the extent of missing data.

“It’s the difference between looking back at historical wins and losses to predict the outcome of a horse race versus handicapping the likely winner based on multiple factors like the horse’s pedigree, track record, temperament, the training regimen, the condition of the track, the jockey’s skill and so on,” Lo said in a statement.

For all six of MIT’s machine learning algorithms, the models which used the imputation of a gold-standard dataset outperformed their complete-case analysis and imputation counterparts, achieving a 0.78 AUC for predicting transitions from phase 2 trials to approval and 0.81 AUC for predicting transitions from phase 3 to approval.

“These results are promising and raise the possibility of even more powerful drug development prediction models with access to better quality data,” Lo and colleagues concluded.