Proteins are essential to the life of cells, carrying out complex tasks and catalyzing chemical reactions. Scientists and engineers have long sought to harness this power by designing artificial proteins that can perform new tasks, like treat disease, capture carbon, or harvest energy, but many of the processes designed to create such proteins are slow and complex, with a high failure rate, Phys reported.
In a breakthrough that could have implications across the healthcare, agriculture, and energy sectors, a team lead by researchers in the Pritzker School of Molecular Engineering (PME) at the University of Chicago has developed an -led process that uses big data to design new proteins.
By developing machine-learning models that can review protein information culled from genome databases, the researchers found relatively simple design rules for building artificial proteins. When the team constructed these artificial proteins in the lab, they found that they performed chemistries so well that they rivaled those found in nature.
“We have all wondered how a simple process like evolution can lead to such a high-performance material as a protein,” said Rama Ranganathan, Joseph Regenstein Professor in the Department of Biochemistry and Molecular Biology, Pritzker Molecular Engineering, and the College. “We found that genome data contains enormous amounts of information about the basic rules of protein structure and function, and now we’ve been able to bottle nature’s rules to create proteins ourselves.”
Using to learn design rules
Proteins are made up of hundreds or thousands of amino acids, and these amino acid sequences specify the protein’s structure and function. But understanding just how to build these sequences to create novel proteins has been challenging. Past work has resulted in methods that can specify structure, but function has been more elusive.
What Ranganathan and his collaborators realized over the past 15 years is that genome databases—which are growing exponentially—contain enormous amounts of information about the basic rules of protein structure and function. His group developed mathematical models based on this data and then began using machine-learning methods to reveal new information about proteins’ basic design rules.
For this research, they studied the chorismate mutase family of metabolic enzymes, a type of protein that is important for life in many bacteria, fungi, and plants. Using machine-learning models, the researchers were able to reveal the simple design rules behind these proteins.
The model shows that just conservation at amino acid positions and correlations in the evolution of pairs of amino acids are sufficient to predict new artificial sequences that would have the properties of the protein family.
The model shows that just conservation at amino acid positions and correlations in the evolution of pairs of amino acids are sufficient to predict new artificial sequences that would have the properties of the protein family.[…]