Artificial intelligence (AI) isn’t just great at applying slow-motion effects to videos and recommending products from pictures of home decor. It’s also capable of besting skilled human players at one of the world’s most popular online strategy games: Valve’s Dota 2.
In a blog post today, OpenAI, a non-profit, San Francisco-based AI research company backed by Elon Musk, Reid Hoffman, and Peter Thiel, and other tech luminaries, revealed that the latest version of its Dota 2-playing AI — dubbed OpenAI Five — managed to beat five teams of amateur players in June, including one made up of Valve employees.
The previous generation of OpenAI’s system was constrained to 1-vs.-1 matches, which are less complex.
“Dota is this really [complicated] task where you have to deal with these long time horizons in a very continuous state,” OpenAI cofounder and CTO Greg Brockman, told VentureBeat in a phone interview.
“Rather than a couple hundred moves in a board game, you’re talking 80,000 individual frames. Whenever you take an action, many of the actions are sort of incremental. You have to somehow figure out how to plan over this long time horizon even though your controls are at a very low level.”
OpenAI’s machine learning algorithms went up against five groups: an OpenAI employee team, a team of audience members who watched the OpenAI employee match, a Valve employee team, an amateur team, and a semi-pro team.
It handily beat the first three teams in several rounds, and it won two of the first three games against the fourth and fifth squads.
OpenAI Dota 2
Above: OpenAI Five’s view from the Dota 2 battlefield.
Admittedly, OpenAI Five had a leg up in a few areas. It could respond to changes in each players’ health, positions, and item inventory instantly.
On average, its neural networks performed around 150-170 actions per minute (up to a theoretical maximum of 450) with a superhuman reaction time of 80 milliseconds. And it played with restrictions on a number of special abilities, items, and characters.
But none of those advantages helped it to accomplish its most impressive feat: developing strategies mirroring those of professional players.
In more than one instance during play, it sacrificed its “safe lane” — the path on the map of least resistance toward the enemy base — in favor of controlling the opposing team’s safe lane. And by aggressively attacking barricades and flanking heroes, it leveled up its own heroes and moved toward the enemy base faster than many of its human opponents.
OpenAI Five also learned new techniques in the course of play, like avoiding projectiles and giving heroes a generous amount of early experience points.
It even deployed techniques like “creep blocking,” in which a hero physically blocks the path of a hostile creep, a basic unit in the game, to slow their progress.
“Gaining long-term rewards such as strategic map control often requires sacrificing short-term rewards … since grouping up to attack … takes time,” the OpenAI team wrote.
“This observation reinforces our belief that the system is truly optimizing over a long horizon.”
On July 28, the OpenAI team plans to stream a match between OpenAI Five and a top Dota 2 team on July 28. In late August at The International, Valve’s annual esports tournament, it aims to beat a team of professional players.
Training OpenAI Five
OpenAI Five consists of five single-layer, 1,024-unit long short-term memory (LSTM) networks — a kind of recurrent neural network that can “remember” values over an arbitrary length of time — each assigned to a single hero.
The networks are trained using a deep reinforcement learning model that incentivizes their self-improvement with rewards. In OpenAI Five’s case, those rewards are kills, deaths, assists, last mile hits, net worth, and other stats that track progress in Dota 2.
Interestingly, the five LSTM networks don’t communicate with each other. Instead, a “team spirit” hyperparameter, ranging in value from 0 to 1, determines how much or how little each agent-controlled hero prioritizes individual rewards over the team’s reward.
To prep for the matches, the system plays 180 year’s worth of games every day — 80 percent of its games against itself and 20 percent against its past selves — on a distributed system of 256 Nvidia Telsa P100 graphics cards and 128,000 processor cores (compared to the old Dota bot’s 60,000 cores).
There’s a lot of data to crunch. During a match, each player character can perform dozens of 170,000 possible actions and combined, all heroes on the board complete an average of 10,000 moves each frame. Altogether, OpenAI Five takes into account 20,000 numbers representing all information human Dota players are allowed to access.
OpenAI’s training framework, Rapid, consists of two parts: a set of rollout workers that run a copy of Dota 2 and an LSTM network, and optimizer nodes that perform synchronous gradient descent — an essential step in machine learning — across a fleet of GPUs.
As the rollout workers gain experience, they inform the optimizer nodes, and another set of workers compare the trained LSTM networks, or agents, to reference agents.
In the first few games, the AI-controlled heroes “walk aimlessly around the map,” the OpenAI wrote. After a few hours, though, they master basics like lane defense and farming, and in days learn advanced strategies like rotating heroes around the map and stealing runes — special boosters that spawn on the game map — from opponents.
“People used to think that this kind of thing was impossible using today’s deep learning,” Brockman said.
“But it turns out that these networks [are] able to play at the professional level in terms of some of the strategies they discover … and really do some long-term planning,” said Brockman. “The shocking thing to me is that it’s using algorithms that are already here, that we already have, that people said were flawed in very specific ways.”
A ‘milestone’ for AI
OpenAI Five isn’t the first AI system to beat human opponents at complex games. AlphaZero, the deep neural network developed by Alphabet subsidiary DeepMind, achieved a superhuman level of play in chess, shogi, and Go.
Carnegie Melon’s poker-playing Liberatus AI came out thousands of fictional dollars ahead in a month-long series of games with professional card players. And a machine learning method developed by Maluuba, which Google acquired in 2017, was used to create a system that achieved a top score of 999,990 in Ms. Pac-Man — higher than any human player has scored.
But for Brockman, OpenAI Five’s achievement is about more than Dota. It’s a significant step toward AI that can perform much more sophisticated tasks than today’s systems.
“Games have really been the benchmark [in AI research,” Brockman said. "These complex strategy games are the milestone that we … have all been working towards because they start to capture aspects of the real world.”
To that end, OpenAI has its fingers in a number of AI pies. Last year, it developed software that produces high-quality datasets for neural networks by randomizing the colors, lighting conditions, textures, and camera settings in simulated scenes.
(Researchers used it to teach a mechanized arm to remove a can of Spam from a table of groceries.) More recently, in February, it released Hindsight Experience Replay (HER), an open-source algorithm that effectively helps robots to learn from failure.
"You want to end up with systems that impact the real world that can help people, whether it’s elderly care robots or other things that are really going to benefit people,” Brockman said. "AI has the potential to be the most positive thing [humans] have ever created.”