Media News Research

How DeepMind’s MuZero AI Learns as It Plays


Delving inside DeepMind’s MuZero algorithm: How MuZero achieves ‘superhuman performance’ in Go, Shogi, Chess and Atari games—without knowing the rules

Author: Abhinav Raj, a political correspondent for Immigration Advice Services.

SwissCognitive, AI, Artificial Intelligence, Bots, CDO, CIO, CI, Cognitive Computing, Deep Learning, IoT, Machine Learning, NLP, Robot, Virtual reality, learning

The search for a model that can effectively explain its environment and use the information to plan its course of action has eluded research for years—but Google’s company seems to have made a breakthrough.

London-based DeepMind Technologies has introduced MuZero, an algorithm which combines a tree-based search with a learned model, achieving ‘superhuman performance’ in visually complex environments such as Atari games, where most models, as the company claims, have ‘historically struggled’.

The developers of AlphaGo and AlphaZero (algorithms that triumphed over world champions in the game of Go) first presented MuZero in a 2019 preliminary paper, proposing a model that could plan winning strategies without being aware of the underlying dynamics, or rules of the game.

(Image: DeepMind Technologies)

In lieu of using rules to permute or run combinations of the best possible scenario, MuZero models critical aspects of the decision-making process. DeepMind analogizes: “After all, knowing an umbrella will keep you dry is more useful to know than modelling the pattern of raindrops in the air.”

MuZero utilizes three parameters of the game environment critical to planning, which include the value, which evaluates how good the current position is; the policy, which determines the best course of action; and finally, the reward, which assesses the last action taken. These are imbibed into the model using  deep neural network (DNN).

An illustration of the Monte Carlo tree search, a heuristic search combination that deduces the best move out of a set of moves. (GIF: DeepMind Technologies)

Through its proprietary action-selection policy, value function and the reward to gauge optimal outcomes, the MuZero algorithm achieved ‘state-of-the-art’ performance across 57 different Atari games in conditions of complex and unknown rules and dynamics.

The genesis of an algorithm capable of modelling its environment, extrapolating information of relevance and utilizing it to plan to achieve desiderata pushes the frontiers of reinforcement learning.

MuZero a testimony to the fact that is no longer playing dice; it’s assessing, deducing, calculating, and observing.

is getting better at things it does every day. Are we?

About the author:
Abhinav Raj is a writer and correspondent for, a website dedicated to shedding light on immigration and asylum-related injustices.


Leave a Reply