On August 5th, an system called the OpenAI Five is set to play against a team of five professional e-sports players in Dota 2 , a game that requires fast-twitch reflexes, an encyclopedic knowledge of the game’s strategies, and most of all, teamwork.
In the video game, two teams of five players are placed at opposite ends of a square arena, and fight past each other using melee and spells to destroy their opponent’s base. It’s one of the most lucrative e-sports right now, with this year’s biggest tournament garnering a prize pool of more than $23 million. For the researcher’s software to win against the pros, it would be like a learning to dunk on Michael Jordan.
Games are an easy way for those of us without PhDs to understand how far research has come: When put in complex situations, can an beat humans? We understand what it meant for IBM’s DeepBlue to beat Garry Kasparov in chess, and DeepMind’s AlphaGo beating Lee Sedol in Go—decades of human practice and skill were defeated by mechanical computation. Outside of those publicized matches, researchers have worked for decades to build agents that are superhuman at playing Atari games, checkers, and even Super Smash Bros .
Not all of the research that’s done on video-game playing is applicable outside of the lab, but outside of the competition, OpenAI is showing that its brand of research can be broadly applicable. An example: The same algorithm that is set to play Dota 2 tomorrow can also be taught to move a mechanical hand.
One of the most popular methods for teaching bots to play games, the technique used by OpenAI, the research lab predominantly founded by Elon Musk and Sam Altman, is called reinforcement learning. It’s when you give a bot an objective, like collecting coins, and rewarding the bot when it completes the objective. At first, the bot’s movements are completely random, until it accidentally figures out how to complete the task. The moves the bot used to complete the task are weighted as better, and the bot is more likely to follow those actions when it tries the next time. After hundreds, thousands, or millions of attempts, strategies emerge.
OpenAI’s Dota 2-playing bot, for instance, plays millions of games against itself over the course of two weeks. Throughout each game, the bots’ reward is shifted from getting points for themselves to increasing the overall team’s score. The research team calls this “team spirit,” as Quartz previously reported.
Games are such a good place for to learn because they’re an analogue of the real world, but with an objective, New York University professor Julian Togelius told Quartz. “The real world doesn’t have interesting tasks,” Togelius said with a laugh. “Games are perfect, they have rewards right there—whether you win or not, and what score you get.” And games can be played an infinite number of times—they’re just software, and can be played at the same time by thousands of bots to multiply the speed at which they find the solution or strategy. […]