An unsupervised, automated machine learning algorithm successfully identified glioblastoma tumor cells and stratified survival outcomes.
“A goal of cancer research is to reveal cell subsets linked to continuous clinical outcomes to generate new therapeutic and biomarker hypotheses,” Rebecca Ihrie, PhD, and Jonathan Irish, PhD, associate professors in the department of cell and developmental biology at Vanderbilt University, and colleagues wrote. “We introduce a machine learning algorithm, Risk Assessment Population IDentification (RAPID), that is unsupervised and automated, identifies phenotypically distinct cell populations, and determines whether these populations stratify patient survival.”
Ihrie and Irish told Healio what prompted this research, implications of the findings and what future research should entail.
Question: What prompted this research?
Ihrie: Cancers are now being studied using single-cell approaches, through which we can learn about the presence and abundance of different subsets of cells within the sample. This project aimed to identify tumor cell subsets that are associated with poor outcomes. For the last 5 years, our team built specific expertise for this project, including:
- cryopreserving cells from brain tumor resections;
- measuring phosphorylated signaling molecules in individual cells; and
- machine learning analysis of the associated data — in our case, approximately 40 readouts for each of more than 2 million cells.
We chose to study glioblastoma because of the importance of cell signaling to the disease and the fact that there is a great need for new treatment. Despite many years of research, it has been extremely challenging to find biological features that are correlated with large differences in patient survival, or that help researchers identify new avenues for treatment. Ultimately, we aim to address these gaps and to develop precision medicine strategies for glioblastoma based on cell signaling biology. Our study differs from others in the field because we chose to measure features at the protein level, rather than DNA or RNA — meaning we could identify cells based on features like post-translational modifications of these proteins, which are important to their function.
Q: What is unique about this algorithm?
Ihrie: Other algorithms usually do one of two things — identify subgroups of cells that are similar to known normal types or divide patients into “good” and “bad” outcomes and look for features that are more abundant in one group vs. the other. We designed RAPID to take the user all the way from the start of analysis (unprocessed single-cell data on cohorts of about 25 patients) to the finish (features that identify especially good or bad cells). RAPID is unique because it does not require prior knowledge about expected cell types or classification of patients in advance — instead, identification of biologically similar cell clusters across patients and testing of whether those clusters predict outcomes is done in an automated fashion. In other words, RAPID is fully unsupervised and uses statistical rules to reveal cells and determine their identity and significance. RAPID also creates human- and computer-readable descriptions of cell populations that can be used to design simpler tests, such as immunohistochemical stains, which are used more regularly in clinical practice and can be applied to large patient cohorts.
Q: What did you find?
Irish: RAPID identified tumor cells whose abundance independently and continuously stratified patient survival among a pilot mass cytometry data set of 2 million cells from 28 glioblastomas. We used an orthogonal platform for biological validation (immunohistochemistry) and a larger cohort of 73 patients with glioblastoma to confirm the findings from the pilot cohort, and we also found that RAPID was validated to find known risk-stratifying cells and features using published data from blood cancer.
Q: What are the clinical implications of your findings?
Irish: In glioblastoma, our findings suggest that patients whose tumors have a high fraction of the positive or negative phenotypes we identified may respond differently to investigational treatments. Patients whose tumors primarily have “positive-prognostic” cells also have a higher percentage of immune cells within their tumors, suggesting that they might benefit from immunotherapy more than patients with “negative-prognostic” tumors. […]