Vaccines are among the most powerful weapons we have for preventing infectious disease. In the 1950s, hundreds of thousands of Americans were infected by measles every year. But by 2015, after decades of vaccination, a mere 191 cases were reported.
Unfortunately, most vaccines take years to develop, and in the midst of a pandemic, society can’t wait. One promising approach to accelerate this process is to use , a form of , to guide vaccine design.
What does it mean to design a vaccine? Vaccines work by exposing you to parts of a pathogen with the aim that your immune system will more easily recognize it in the future, mounting a quicker and more robust response. The oldest forms of vaccines were composed of dead viruses that are relatively safe but sometimes ineffective or live, weakened viruses that pose greater safety risks. More recent vaccines tend to contain specific components of a virus (such as the surface protein for hepatitis B vaccines) that are judged to be safe and effective. Future vaccines might even include specific viral protein fragments. Regardless of the way in which a vaccine is composed, the design goal is always to include viral components that are highly immunogenic: visible to your immune system and eliciting an immune response.
In recent years, researchers in immunology and have studied and modeled many of the properties of viruses that make them immunogenic. One key property is what parts of a virus can be targeted by antibodies, proteins produced by B-cells that can prevent viral entry into cells and inhibit the spread of a virus throughout your body. Another key property is what viral protein fragments will be presented on a human cell’s surface, marking a cell as infected so that it can be killed by T-cells. We and other researchers have trained models to make predictions about the strength of these properties for any viral fragment. Using such models, we can better choose what parts of a virus are most likely to be immunogenic and should be included in a vaccine.
Machine learning models learn to recognize patterns from a large number of training examples, often in ways that humans would have a very difficult time replicating. For example, immunologists have identified nearly one million protein fragments that are presented on a cell’s surface and visible to T-cells. However, no human eyes would be able to tell you whether this is true of SYGFQPTNGVGYQPY, a fragment from the novel coronavirus. On the other hand, a model can learn to answer this question from those million other examples, building an understanding of what patterns among the letters representing amino acids lead to a high likelihood of presentation. Last year, we published a model in the journal Nature Biotechnology dubbed MARIA that is trained to make these kinds of predictions. Many research labs have created similar models that can be applied to other kinds of immune response, including antibody binding.[…]