BioAutoMATED, an open-source, automated machine-learning platform, aims to help democratize artificial intelligence for research labs.
Copyright: news.mit.edu – “MIT Dcientists Build a System That Can Generate AI Models for Biology Research”
Is it possible to build machine-learning models without machine-learning expertise?
Jim Collins, the Termeer Professor of Medical Engineering and Science in the Department of Biological Engineering at MIT and the life sciences faculty lead at the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic), along with a number of colleagues decided to tackle this problem when facing a similar conundrum. An open-access paper on their proposed solution, called BioAutoMATED, was published on June 21 in Cell Systems.
Recruiting machine-learning researchers can be a time-consuming and financially costly process for science and engineering labs. Even with a machine-learning expert, selecting the appropriate model, formatting the dataset for the model, then fine-tuning it can dramatically change how the model performs, and takes a lot of work.
“In your machine-learning project, how much time will you typically spend on data preparation and transformation?” asks a 2022 Google course on the Foundations of Machine Learning (ML). The two choices offered are either “Less than half the project time” or “More than half the project time.” If you guessed the latter, you would be correct; Google states that it takes over 80 percent of project time to format the data, and that’s not even taking into account the time needed to frame the problem in machine-learning terms.
“It would take many weeks of effort to figure out the appropriate model for our dataset, and this is a really prohibitive step for a lot of folks that want to use machine learning or biology,” says Jacqueline Valeri, a fifth-year PhD student of biological engineering in Collins’s lab who is first co-author of the paper.
BioAutoMATED is an automated machine-learning system that can select and build an appropriate model for a given dataset and even take care of the laborious task of data preprocessing, whittling down a months-long process to just a few hours. Automated machine-learning (AutoML) systems are still in a relatively nascent stage of development, with current usage primarily focused on image and text recognition, but largely unused in subfields of biology, points out first co-author and Jameel Clinic postdoc Luis Soenksen PhD ’20.[…]
Read more: www.news.mit.edu