UNIGE researchers have developed an approach that combines genomics and machine learning tools to explore the microbial biodiversity of ecosystems.
copyright by www.unige.ch
Microorganisms perform key functions in ecosystems and their diversity reflects the health of their environment. However, they are still largely under-exploited in current biomonitoring programs because they are difficult to identify. Researchers from the University of Geneva (UNIGE), Switzerland, have recently developed an approach combining two cutting edge technologies to fill this gap. They use genomic tools to sequence the DNA of microorganisms in samples, and then exploit this considerable amount of data with artificial intelligence. They build predictive models capable of establishing a diagnosis of the health of ecosystems on a large scale and identify species that perform important functions. This new approach, published in the journal Trends in Microbiology , will significantly increase the observation capacity of large ecosystems and reduce the time of analysis for very efficient routine biomonitoring programs.
Monitoring the health status of ecosystems is of crucial importance in a context of sustainable development and increasing human pressure on the environment. Different species of micro-organisms sensitive to changes in their surroundings are used as bio-indicators for monitoring environmental quality. However, their morphological identification requires a lot of time and expertise. “A year ago, we were able to establish a water quality index based solely on the DNA sequences of unicellular algae present in the samples, without needing to visually identify each species”, explains Jan Pawlowski, Professor at the Department of Genetics and Evolution of the UNIGE Faculty of Science.
Use DNA sequences without having to identify them
Genomic tools make it possible to quickly and very accurately describe the biological communities inhabiting an environment. However, a large proportion of the data cannot be used to conduct environmental health diagnoses because many DNA sequences are not referenced in existing databases. The species that possess these sequences are therefore unknown, as well as their ecological role. “In order to exploit all environmental genomics data, namely all the biodiversity of the samples, we used a machine learning algorithm”, notes Tristan Cordier, a member of the Geneva group and first author of the study.
The biologists used samples of different known ecological quality status, ranging from good to bad, from which they sequenced the DNA. The combination of this information allowed them to build a reference system with the data from each sample. “A predictive model was then developed with this algorithm, based on our training data. These include data from reference diagnoses and data from the sequencing of unknown species”, says Jan Pawlowski. This model is refined and validated over time by including new reference samples to the existing training dataset.[…]