Neural networks have been taught to quickly read the surfaces of proteins — molecules critical to many biological processes. The advance is already being used to create defenses for the virus responsible for COVID-19.
copyright by www.quantamagazine.org
The computational biologist Bruno Correia used to have a rule in his lab: No machine learning allowed. He didn’t consider it real science. Now Correia has used it to detect potential interactions between proteins — the complex folded molecules responsible for many biological processes — 40,000 times faster than conventional methods. The journal Nature Methods featured his system on its cover in February 2020 . Correia said of his early reluctance to embrace machine learning, “I was wrong, and I’m glad I was wrong.”
What changed his mind? Geometric deep learning : an emerging subfield of artificial intelligence that can learn patterns on curved surfaces.
Proteins interact by fitting their bumpy, irregular shapes together like three-dimensional puzzle pieces. Researchers have spent decades trying to figure out how they do so. The well-known protein folding problem , which has challenged scientists since the mid-20th century, attempts to understand protein interaction by decoding the link between a protein’s constituent amino acids and its final 3D shape. In 1999, IBM began developing its line of Blue Gene supercomputers to tackle the folding problem; 20 years later , DeepMind applied state-of-the-art deep learning algorithms to it.
Correia’s system, called MaSIF (short for molecular surface interaction fingerprinting), avoids the inherent complexity of a protein’s 3D shape by ignoring the molecules’ internal structure. Instead, the system scans the protein’s 2D surface for what the researchers call interaction fingerprints: features learned by a neural network that indicate that another protein could bind there. “The idea [is that when] any two molecules come together, what they’re essentially presenting to one another is that surface. So that’s all you need,” said Mohammed AlQuraishi, a protein researcher at Harvard Medical School who also uses deep learning. “It’s very, very innovative.”
MaSIF’s surface-focused framework for predicting protein interactions could help accelerate so-called de novo protein design, which tries to synthesize useful proteins from scratch rather than relying on the naturally occurring variety. But it could also be used for basic biology, said Michael Bronstein, a geometric deep learning expert at Imperial College London who helped develop the system. “How does cancer affect protein properties?” he said. “You can ask whether mutations as a result of cancer destroy something in the protein that makes them work in a different way, by not binding to what they are supposed to. [MaSIF] could answer fundamental questions.”
If you want to understand how deep learning can create protein fingerprints, Bronstein suggests looking at digital cameras from the early 2000s. Those models had face detection algorithms that did a relatively simple job. “You just need to detect that there is a face” — eyes, a nose, a mouth — “regardless of whether it has a long nose or a short nose, fat lips or thin lips,” he explained.
Modern cameras are more versatile. They can identify a particular person, allowing you to quickly search through your photo library to find all the photos they’re in. […]