Körber-Prize 2019
14/24

»The DeepMind researchers saved the experience the system had already gained and used this in a game against itself to train further. This is similar to procedures that take place in the brain’s hippocampus during sleep.«  
BERNHARD SCHÖLKOPF


The Google firm DeepMind achieved a spectacular ­success in this subdiscipline in 2015. (DeepMind is a startup grounded in London in 2010, which Google purchased in January, 2014.) The DeepMind researchers had created a neural network that independently learned to play against 49 classical computer games from the 1980s that ran on an Atari 2600 console. To do this, the team used the so-called deep learning procedure, in which a neural network is used that has a particularly large number of neurons and layers. As input, the network was given the color pixels of the respective video game and the score as displayed. As output, the network produced joystick movements. The algorithm was programmed so that joystick movements that raised the score were rewarded. At the beginning, the network moved the joystick at random, and often wrongly. Little by little it learned, however, to optimize the movement so that the number of winning points increased. After many thousands of games the network was good enough that it played as well as human competitors. It had taught itself the rules.

At the invitation of the prestigious British science journal Nature, Bernhard Schölkopf wrote about this in an article entitled ›Learning to see and act.‹ What especially impressed him was that the DeepMind re­searchers had the network play against itself after the initial learn­ing phase. »They saved the experience the system had already gained and used this in a game against itself to train further. This is similar to proce­dures that take place in the brain’s hippocampus during sleep.« The brain re­searchers May-Britt and Edvard Moser, who were awarded the Körber Prize in 2014, shortly before re­ceiving the Nobel Prize, had indeed determined in experiments with rats—that ran through a labyrinth during the daytime—that during sleep at night the rodents recapitulated once again in their hippocampus the daytime trial runs. ­Such dreamed recapitulations result in a synaptic consolidation of what has been learned. In AI research, this process is desig­nated ›reinforcement learning.‹