Many great ideas in artificial intelligence (AI) languish in textbooks for decades because we don’t have the computational power to apply them. That’s what happened with neural networks, a technique inspired by our brains’ wiring that has recently succeeded in translating languages and driving cars.
Now, another old idea—improving neural networks not through teaching, but through evolution—is revealing its potential. Five new papers from Uber, in San Francisco, demonstrate the power of so-called neuroevolution to play video games, solve mazes, and even make a simulated robot walk.
Neuroevolution, a process of mutating and selecting the best neural networks, has previously led to networks that can compose music, control robots, and play the video game Super Mario World. But these were mostly simple neural nets that performed relatively easy tasks or relied on programming tricks to simplify the problems they were trying to solve. “The new results show that—surprisingly—you may actually not need any tricks at all,” says Kenneth Stanley, a computer scientist at Uber and a co-author on all five studies. “That means that complex problems requiring a large network are now accessible to neuroevolution, vastly expanding its potential scope of application.”
Old solutions for new applications
At Uber, such applications might include driving autonomous cars, setting customer prices, or routing vehicles to passengers. But the team, part of a broad research effort, had no specific uses in mind when doing the work. In part, they merely wanted to challenge what Jeff Clune, another Uber co-author, calls “the modern darlings” of machine learning: algorithms that use something called “gradient descent,” a system that gradually improves a solution by reducing its error. Nearly all methods of training neural networks to perform tasks rely on gradient descent.
More than one approach at once
The most novel Uber paper uses a completely different approach that tries many solutions at once. A large collection of randomly programmed neural networks is tested (on, say, an Atari game), and the best are copied, with slight random mutations, replacing the previous generation. The new networks play the game, the best are copied and mutated, and so on for several generations. The advantage of this method over gradient descent is that it tries a variety of strategies instead of putting all its effort into perfecting a single solution. When compared with two of the most widely used methods for training neural networks, this exploratory approach outscored them on five of 13 Atari games. It also managed to teach a virtual humanoid robot to walk, developing a neural network a hundred times larger than any previously developed through neuroevolution to control a robot. […]