In this profile series, we interview AI innovators on the front-lines – those who have dedicated their life’s work to improving the human condition through technology advancements. This time, meet Damian Borth.
Damian Borth, chair in the Artificial Intelligence & Machine Learning department at the University of St. Gallen (HSG) in Switzerland, and past director of the Deep Learning Competence Center at the German Research Center for Artificial Intelligence (DFKI). He is also a founding co-director of Sociovestix Labs, a social enterprise in the area of financial data science. Damian’s background is in research where he focuses on large-scale multimedia opinion mining applying machine learning and in particular deep learning to mine insights (trends, sentiment) from online media streams.
Damian talks about his realization in deep learning and shares why integrating his work with deep learning is an important part to help prevent future natural disasters.
What has your journey been like in deep learning? How did you end up at DFKI?
I spent two years in Taiwan, went to the University of Kaiserslautern, Germany for my PhD while having a stopover at Columbia University, and did my post-doctoral at UC Berkeley and the International Computer Science Institute in Berkeley. In Berkeley, I spent my time on deep learning network architectures and got really into it. That was a really great time. After my stay in the US, I went back to the DFKI to found the Deep Learning Competence Center. Now, I am helping the University of St. Gallen to establish a lab in Artificial Intelligence and Machine Learning and hopefully soon the buildup of a new computer science faculty.
What made you become a DL believer?
I was actually a “non-believer” in deep learning, until I started my post-doc at UC Berkeley. It’s very hard to train a neural network efficiently without sufficient data and at the time that I started by PhD, neural networks were not trusted as the go-to method. Instead, we looked at support vector machines for classification. But then AlexNet came along and showed neural networks do, in fact, work consistently. Then people began to download the Caffe framework, use it, improve it, and outperform other architectures.
What did you do in Berkeley?
I continued the work we have started at Columbia in sentiment analysis for pictures. It could classify objects like e.g. animals such as a dog or a cat. We attached adjectives to the noun and made the analysis differentiate between a scary dog or a cute dog. The vocabulary was roughly 2,000 adjectives noun pairs (ANP). By conditioning the noun with an adjective, we were able to move a very objective judgement to a subjective assessment. Doing so we were able to derive a link from this mid-level representation to a higher level of sentiment representation. The positive image of a cute dog or a laughing baby could flip to a negative sentiment when it saw a dark street or a bloody accident. This mid-level representation proved to be also very successful beyond sentiment analysis and was applied to aesthetics and emotion detection. It created a bridge between the objective world and the subjective world of visual content. In Berkeley I was also part of the team creating the YFCC100m dataset the largest curated image dataset at that time. Having such a dataset with 100 million creative common images and videos from Flickr helps if you want to train a very deep neural network architecture.
Did you continue your sentiment analysis work with DFKI?
We call it Multimedia Opinion Mining (MOM), because we want it to consider different modalities such as video and audio. Currently we’re extending deep learning architectures towards multi-model signal processing. The goal is to take different modalities as an input and move them all into one architecture. If you have a self-driving car, you’re not only detecting the visual signal of the camera, but also the radar data from an audio signal and others in one network. Working with different architectures such as late fusion, infusion, and in some work on early fusion demonstrated to improve system performance. In particular early fusion has been successfully used in satellite image analysis for remote sensing where a lot of information is multi-model. This is really a game changer for disaster recovery. Using this information, we can help with flooding and wildfires disasters where emergency response teams on the ground can get immediate information from satellites to find where the fire is, what the flooding looks like, or how many buildings can be affected and is it accessible by road or by boat.
Can you elaborate on the disaster response case? How can your work help these first responders?
We were analyzing data collected from a wildfire case at Fort McMurray. When we looked at the data, initially we saw that the area around the fire, in particular the vegetation and already burned area was a strong indicator for the direction of the fire spread. Once the wind changed the fire changed its course as well which caused more damage. This analysis would have predicted that change of how the fire develops much earlier. Such information is very valuable to the first responders and their work on the ground. Another case we’re currently working on is with flooding. We started a benchmark challenge to foster collaboration to build up a community with MediaEval Satellite Task. In the first year 16 teams from around the world have been participating. The teams submit their neural networks results and we compare the performance on the test data set to figure out which one provides the best predictions. This way we know very quickly which approaches work and which not.