How should we evaluate such algorithms? Since they teach the agent which task to do, we need an environment with many possible tasks. But this isn’t true of Atari / MuJoCo. For example, in Pong and Breakout, you hit the ball back, or you die. There are no other options. (3/7)