Danijar Hafner Profile Banner
Danijar Hafner Profile
Danijar Hafner

@danijarh

Followers
14,082
Following
873
Media
163
Statuses
2,357

Building AI that makes autonomous decisions using world models, artificial curiosity, and temporal abstraction @DeepMind

Berkeley
Joined August 2013
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@danijarh
Danijar Hafner
1 year
Excited to announce DreamerV3 🌍, a scalable and general RL algorithm that masters a wide range of applications with fixed hyperparameters! Applied out of the box, it solves the Minecraft Diamond challenge without human data. 💎 👇 Thread
@GoogleDeepMind
Google DeepMind
1 year
Introducing DreamerV3: the first general algorithm to collect diamonds in Minecraft from scratch - solving an important challenge in AI. 💎 It learns to master many domains without tuning, making reinforcement learning broadly applicable. Find out more:
67
573
3K
52
232
1K
@danijarh
Danijar Hafner
5 years
What I value most about Jupyter notebooks is having all results and figures together in a doc. Today I'm releasing Python Handout, a package that lets you create docs with inline figures, images, videos directly from Python scripts. ✨ 📰 ✨ Thread 👇
Tweet media one
Tweet media two
42
590
2K
@danijarh
Danijar Hafner
2 years
The full training run of the A1 quadruped robot learning to walk from scratch in the real world in 1 hour! Made possible by training a world model online and planning inside of it. Excited to see what we can do next with this! @philippswu @AleEscontrela @Ken_Goldberg @pabbeel
49
253
2K
@danijarh
Danijar Hafner
3 years
DreamerV2 learns a world model of the DeepMind humanoid and solves standup and walking from only pixel inputs 🌍🚀
19
67
737
@danijarh
Danijar Hafner
2 years
A dream come true! We introduce DayDreamer, where we apply world models for fast end-to-end learning on 4 physical robots, without simulators. We learn quadruped walking from scratch in 1 hour. We also learn to pick & place balls directly from pixels and sparse rewards 🤖🌏👇
12
165
719
@danijarh
Danijar Hafner
2 years
Excited to share Director, a practical, general, and interpretable reinforcement learning algorithm for learning hierarchical behaviors from pixels! Director explores and solves long-horizon tasks with very sparse rewards by breaking them down into internal subgoals. Thread 👇
12
120
652
@danijarh
Danijar Hafner
3 years
Excited to present Clockwork VAEs for video prediction! Clockwork VAEs (CW-VAEs) leverage hierarchies of latent sequences, where higher levels tick slower. They learn long-term deps across 1000 frames, semantically separate content, and outperform strong video models. 👇 Thread
10
89
478
@danijarh
Danijar Hafner
4 years
World models are the future and the future is now! 🌎🚀 Proud to share DreamerV2, the first agent that achieves human-level Atari performance by learning behaviors purely within a separately trained world model. Paper: Thread 👇
Tweet media one
9
97
454
@danijarh
Danijar Hafner
9 months
Excited to introduce Dynalang, an interactive agent that understands diverse types of language in visual environments! 🤖💬 By learning a multimodal world model 🌍, Dynalang understands task prompts, corrective feedback, simple manuals, hints about out of view objects, and more
4
84
414
@danijarh
Danijar Hafner
11 months
People want AGI so when they see meaningful progress in AI they think it might be THE ONE MISSING KEY. Many now try to massage LLMs into "AGI" but it won't work. LLMs are far from AGI (🥲) and only 1 piece of the solution. Focus on the unsolved pieces would mean faster progress!
35
60
380
@danijarh
Danijar Hafner
3 years
Excited to introduce Crafter! 🌴🤖💎 Crafter is a game that evaluates a wide range of agent abilities within a single env with visual inputs. It tests generalization, exploration, and long-term reasoning. Made for both, reward agents and unsupervised agents Thread 👇
7
59
363
@danijarh
Danijar Hafner
4 years
We introduce Dreamer, an RL agent that solves long-horizon tasks from images purely by latent imagination inside a world model. Dreamer improves over existing methods across 20 tasks. paper code Thread 👇
4
81
341
@danijarh
Danijar Hafner
5 years
Excited to share our Deep Planning Network (PlaNet), an RL agent planning in latent space to solve control task from pixels. Now with Google AI post, animated paper, and open source code. Post: Paper: Code:
3
89
310
@danijarh
Danijar Hafner
4 years
What objectives can an intelligent agent optimize? In this 3 year collab, we categorized the possible objs. APD is a unifying principle that explains repr learning, reward, infogain exploration, empowerment, skill discovery, and niche seeking. 👇 Thread
Tweet media one
3
59
288
@danijarh
Danijar Hafner
3 years
Excited to share our Google AI Blog post on DreamerV2, the first RL agent based on a general world model to achieve human-level performance on the Atari benchmark! 🌏🤖🚀
@GoogleAI
Google AI
3 years
Presenting DreamerV2, the first world model-based #ReinforcementLearning agent to achieve top-level performance on the Atari benchmark, learning general representations from images to discover successful behaviors in latent space. Read more at
18
269
999
8
49
271
@danijarh
Danijar Hafner
4 years
Tried mixed precision yet? Took 10 min to set up and my model runs almost 2x faster with same results. Vars and grads are still 32 bits so it usually doesn't affect predictive performance. E.g. in TF2, set option and make all input to your layers float16 (data, RNN states, ..):
Tweet media one
1
35
219
@danijarh
Danijar Hafner
3 months
Current video gen models are breathtaking! But they aren't that useful for acting yet: Prompt Sora with a photo & "find me a screwdriver" and it'll swing the camera to conveniently reveal one lying there for you but in reality there won't be one
@ylecun
Yann LeCun
3 months
Let me clear a *huge* misunderstanding here. The generation of mostly realistic-looking videos from prompts *does not* indicate that a system understands the physical world. Generation is very different from causal prediction from a world model. The space of plausible videos is…
207
782
5K
16
22
207
@danijarh
Danijar Hafner
18 days
🌎 Excited to share a major update of the DreamerV3 agent! A couple of smaller changes, more benchmarks, and substantially improved performance. 👇 Main differences from our earlier preprint:
Tweet media one
@danijarh
Danijar Hafner
1 year
Excited to announce DreamerV3 🌍, a scalable and general RL algorithm that masters a wide range of applications with fixed hyperparameters! Applied out of the box, it solves the Minecraft Diamond challenge without human data. 💎 👇 Thread
52
232
1K
5
30
205
@danijarh
Danijar Hafner
2 years
I'm excited about large general agents but I don't quite understand this paper. Surely you can fit many experts into a transformer with BC. The difficulty is to then learn new tasks faster. But 1000 expert demos to swing up a cart pole is worse than training PPO from scratch?
@GoogleDeepMind
Google DeepMind
2 years
Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: Paper: 1/
95
1K
5K
8
28
203
@danijarh
Danijar Hafner
3 years
Excited to share Evaluating Agents without Rewards! We compare intrinsic objectives with task reward and similarity to human players. Turns out they all correlate more w/ human than w/ reward. Two of them even correlate more w/ human than reward does. 👇
Tweet media one
2
42
198
@danijarh
Danijar Hafner
1 year
For practitioners and researchers who want to solve hard reinforcement learning tasks without having to tune any knobs, DreamerV3 is now available on GitHub! 🧑‍💻🤖 Runs on 1 GPU, supports image/proprio/both inputs, discrete/continuous actions, etc
13
38
185
@danijarh
Danijar Hafner
6 years
Want to build VAEs in TensorFlow?
Tweet media one
0
66
169
@danijarh
Danijar Hafner
4 years
Exploring worlds by planning for expected novelty is what originally motivated PlaNet and Dreamer. Excited to share Plan2Explore, a new RL agent that explores to learn an accurate world model 🌍, indep of any task. SOTA zero-shot on DMControl 🚀 Thread 👇
@pathak2206
Deepak Pathak
4 years
RL agents get specific to tasks they are trained on. What if we remove the task itself during training? Turns out, a self-supervised planning agent can both explore efficiently & achieve SOTA on test tasks w/ zero or few samples in DMControl from images!
10
168
681
5
32
149
@danijarh
Danijar Hafner
2 years
Current RL algorithms still struggle under partial observability, which is common e.g. in real 3D environments. Excited to introduce the Memory Maze benchmark, carefully designed for evaluating long-term memory of RL algorithms! 🏠🤖🚀 @jurgisp @countzerozzz
6
20
145
@danijarh
Danijar Hafner
2 years
Video prediction has seen great progress recently but long videos are still inconsistent, e.g. when moving around 3D scenes. I'm excited to share Temporally Consistent Video Transformer (TECO), a scalable transformer that substantially improves learning of long dependencies! 🚀
@wilson1yan
Wilson Yan
2 years
Excited to announce TECO, an efficient video prediction model that can generate long, temporally consistent video for complex datasets in 3D scenes such as DMLab, Minecraft, Habitat, and real-world video from Kinetics! 📜 🌐 (🧵)
8
61
407
1
18
142
@danijarh
Danijar Hafner
10 months
A big day for Python! The steering council has decided to remove the GIL: - Will unlock fast multithreading - User code can stay exactly the same - Experimental support planned for 3.13 (Oct 2024)
2
19
139
@danijarh
Danijar Hafner
3 years
VQGAN+CLIP "matte painting of a robot exploring the world | trending on artstation" @images_ai #vqgan
Tweet media one
3
12
135
@danijarh
Danijar Hafner
3 years
Very excited to present LEXA, a reinforcement learning agent that learns to achieve challenging goal images without any supervision, through forward-looking exploration with a world model 🌎🚀
@pathak2206
Deepak Pathak
3 years
How could we enable an agent to perform many tasks? Supervising for every new task is impractical. We present Latent Explorer Achiever (LEXA) that explores by discovering goals far beyond the frontier and then achieves test tasks, specified via images, in a zero-shot manner.
4
66
411
1
21
136
@danijarh
Danijar Hafner
6 years
Autograph turns Python if, while, assert, etc into the corresponding TensorFlow ops with a function decorator. This will make TensorFlow a lot faster to write and maintain, without sacrificing in-graph performance. Can't wait for the first stable release!
@jekbradbury
James Bradbury
6 years
TensorFlow autograph, to be demoed in a couple hours, compiles Python code with control flow directly to a TensorFlow graph (CC @broolucks )
0
64
209
0
31
125
@danijarh
Danijar Hafner
4 years
Proud to share our blog post on Dreamer, our latest scalable RL agent. Dreamer learns a world model from images & efficiently finds long-term behaviors by backprop through imagined states🚀 Post Paper Videos
1
17
107
@danijarh
Danijar Hafner
4 years
RL shifts the question of what intelligent behavior is to finding a reward function. I think we should focus more on what environment and reward function rather than on what RL algorithm to use. Is there theory for how properties of env and reward affect the resulting behavior?
@kaixhin
Kai Arulkumaran
4 years
I've always felt that rewards/RL oversimplify behaviour. Maybe now there's a shift back to "Planning by Probabilistic Inference" (AISTATS, 2003): Presents the simple idea that we condition actions on goals, with a desired return as a possible goal.
5
26
114
12
11
101
@danijarh
Danijar Hafner
5 years
Planning from pixels using latent dynamics models (think sequential VAE). We solve cup catch, walker, and several other control tasks. Outperforms A3C and comparable to D4PG with 50x less experience. #NeurIPS Paper: Website:
2
23
99
@danijarh
Danijar Hafner
5 years
It's fantastic to see so many people being interested in task-agnostic RL and making it to our workshop yesterday. Feels well worth the effort to organize and like we actually did something good for the community :) Recordings (starts at 23:30):
Tweet media one
2
19
97
@danijarh
Danijar Hafner
3 years
AI safety twitter: I asked for "mars rover cooking a meal in my apartment" but instead it's remodeling my whole apartment into a mars crater now @images_ai
Tweet media one
2
6
94
@danijarh
Danijar Hafner
4 years
No need to wait for your experiments to finish anymore!
Tweet media one
6
7
95
@danijarh
Danijar Hafner
2 years
It would be great if paper reviews would include a field "What do the authors need to do for you to increase the score?"
8
4
95
@danijarh
Danijar Hafner
6 years
Updated my TensorFlow char-rnn to use a clean input pipeline. Also includes an interactive command line for generating text. See some text samples generated after training on ArXiv abstracts below.
Tweet media one
0
24
88
@danijarh
Danijar Hafner
5 years
The difference to Jupyter is that information flows only one way: from your script to the handout. No hidden state and no confusion about cell execution order. If this might be for you, please give it a try and let me know any feedback! 👉
4
10
80
@danijarh
Danijar Hafner
11 months
@TalkRLPodcast What I think is important are general inductive biases for learning from much less data, unbounded online improvement (beyond in context), latent goals for RL, compute efficient training, dealing with long sequences, sparse planning over relevant features, intrinsic exploration
4
9
82
@danijarh
Danijar Hafner
5 years
Barista just handed me a cup with the words "I hope your day goes by fast." Funny how everyone assumes you don't like working. Makes me appreciate being a researcher and having found work that I love
2
3
79
@danijarh
Danijar Hafner
2 years
After the A1 learned to walk in 1 hour, we started pushing the robot and applying external perturbations. Continuously learning in the real world, Dreamer adapts within 10 minutes to withstand pushes or quickly roll over and stand back up! No robots were harmed here
6
15
77
@danijarh
Danijar Hafner
3 years
Tweet media one
@hardmaru
hardmaru
3 years
Reward is Unnecessary
8
21
87
2
7
76
@danijarh
Danijar Hafner
1 year
DreamerV3 is also the first algorithm to collect diamonds in Minecraft without human demonstrations or curricula, solving a big exploration challenge in AI. Here is the episode where it finds its first diamond, which happens at 30M env steps or 17 days of playtime. 🌴🏔️🛠️💎
1
7
72
@danijarh
Danijar Hafner
1 year
On a set of DMLab tasks, DreamerV3 exceeds IMPALA while using over 130x fewer environment steps. This demonstrates that the peak performance of DreamerV3 exceeds model-free algorithms, while reducing data requirements by two orders of magnitude. 🤖⚡
Tweet media one
2
10
73
@danijarh
Danijar Hafner
1 year
Check out the paper with a lot of benchmarks! Paper: Website: Code coming soon. Big thanks to @jurgisp , @jimmybajimmyba , and @countzerozzz ✨ Happy to answer questions and go into details 🙋
19
3
68
@danijarh
Danijar Hafner
1 year
DreamerV3 learns a world model 🌐 that predicts abstract outcomes of actions and uses it to train long-horizon behaviors in imagination. Predictions in symlog space and percentile return normalization enables successful learning across domains with fixed hyperparameters.
Tweet media one
Tweet media two
2
2
65
@danijarh
Danijar Hafner
1 year
The key contribution of DreamerV3 is an algorithm that works out of the box on new application domains, without having to adjust hyperparameters. This reduces the need for expert knowledge and computational resources, making reinforcement learning broadly applicable. 📊📈
Tweet media one
1
4
65
@danijarh
Danijar Hafner
2 years
This is the moment it became conscious
@_oleh
Oleg Rybkin
2 years
@ak92501 After only one hour of training from scratch the robot becomes sentient enough to attempt to escape!
1
0
8
3
4
65
@danijarh
Danijar Hafner
1 year
@vokaysh @DeepMind It's a hard exploration problem: There are sooo many possible sequences of button presses, but only very few are meaningful and accomplish all the necessary intermediate tasks. Hence, the diamond challenge has been an AI competition for several years
1
5
65
@danijarh
Danijar Hafner
6 years
I will be joining Toronto in September for my PhD with Jimmy Ba! 🎉❄️🤖 #UofT #ReinforcementLearning
8
3
63
@danijarh
Danijar Hafner
1 year
@dan_s_becker Most RL algos are domain-specific and require a lot of data, limiting then to tasks where data is cheap. The text domain is quite broad and LLMs are already useful but might run into the same problem when we want them to read PDFs, browse the web, help us with research, etc
0
3
62
@danijarh
Danijar Hafner
2 years
This was a super fun collaboration with @AleEscontrela and @philippswu , who both did a fantastic job! Thanks to @Ken_Goldberg and @pabbeel for supporting the project! ✨To learn more, just ask or check out the links: Website Paper
5
5
62
@danijarh
Danijar Hafner
5 years
@goodfellow_ian Besides what's mentioned, it can help to initialize the output layer biases to the empirical class frequencies. Otherwise, it spends the first couple of epochs just learning those. I also like the idea of SkewFit by Pong et al. 2019
0
4
55
@danijarh
Danijar Hafner
6 years
Check out this TensorFlow Probability presentation as an introduction with many examples: GLMs, bijectors, HMC, VAEs. #TensorFlow #TFP Video: Slides:
Tweet media one
Tweet media two
0
15
57
@danijarh
Danijar Hafner
6 years
Three patterns for fast prototyping and research in #TensorFlow !
1
21
56
@danijarh
Danijar Hafner
3 years
If you're using PlaNet or Dreamer and you have a slow GPU, you can often decrease the image resolution. The training curves for 64x64 and 32x32 look almost identical and the latter runs almost twice as fast. These two plots show eval performance and frames per second
Tweet media one
Tweet media two
4
4
54
@danijarh
Danijar Hafner
1 year
Due to its robustness, we observe highly favorable scaling properties of DreamerV3. Increasing the model size directly translates not just to higher final performance but also improves data-efficiency! This gives us a path to scale up and solve harder problems. 📈🚀
Tweet media one
1
2
52
@danijarh
Danijar Hafner
2 years
To see if modern world models allow for fast robot learning, we train online on 4 robots. Starting on its back, the A1 quadruped learns to roll over, stand up, and walk in 1 hour without resets! Prior work required lots of simulation, footstep controllers, or reset policies
3
6
52
@danijarh
Danijar Hafner
6 years
Releasing STEVE, our latest agent that learns both uncertainty aware dynamics model and Q-function. Improves sample efficiency over DDPG by an order of magnitude while solving more difficult tasks. Feedback welcome! Paper: Code:
Tweet media one
1
7
50
@danijarh
Danijar Hafner
2 years
Deep reinforcement learning often needs too much trial and error to be practical on physical robots, which means one needs to train in simulation first. But simulators don't capture the complexity of the real world and the resulting policies don't adapt to changes in the world
2
1
49
@danijarh
Danijar Hafner
3 months
Making models like these useful for acting requires separating actions from outcomes, like in the world models we use in RL. Then the agent can be optimistic about the actions but neutral about their outcomes, rather than being optimistic about the outcomes
3
1
49
@danijarh
Danijar Hafner
2 years
Maybe I'm missing something? Despite being disappointed by the results, this is still great engineering and I'm excited for their future follow-ups. If you want to see an agent that achieves new goals zero-shot (although in the same env), check out LEXA
@pathak2206
Deepak Pathak
3 years
How could we enable an agent to perform many tasks? Supervising for every new task is impractical. We present Latent Explorer Achiever (LEXA) that explores by discovering goals far beyond the frontier and then achieves test tasks, specified via images, in a zero-shot manner.
4
66
411
2
4
48
@danijarh
Danijar Hafner
1 year
Thanks for having me on for the second time, Robin. Fun chat about DreamerV3 and the future of RL, incl unsupervised approaches, hierarchical planning, and how these ideas will help the next generation of embodied agents and LLMs 🤖
@TalkRLPodcast
TalkRL Podcast
1 year
Episode 42 @danijarh on the DreamerV3 agent and world models, the Director agent and hierarchical RL, realtime RL on robots with DayDreamer, and his framework for unsupervised agent design!
1
9
61
1
5
47
@danijarh
Danijar Hafner
5 years
Only two days since releasing Python Handout to generate reports, as alternative workflow to Jupyter notebooks. @TDTneuro has already updated their neuro analysis example gallery. Rendered handouts: Python scripts to generate them:
Tweet media one
Tweet media two
@markhanus
Mark Hanus
5 years
Check out the new examples here! h/t to @danijarh for the excellent handout package #Python #OpenScience
2
0
3
1
5
44
@danijarh
Danijar Hafner
3 years
I made a video to summarize action and perception as divergence minimization! The framework offers a unified perspective on many of the intrinsic objective functions used in deep RL and also connects them to the free energy principle.
2
3
43
@danijarh
Danijar Hafner
5 years
Move aside, @Simone_Biles . This hopper is determined to make it to the 2020 Olympic games in Tokyo. Practicing the triple-double next. #AI #RL
2
3
43
@danijarh
Danijar Hafner
1 year
For people who want to train robots from small amounts of data in the physical world, the code for DayDreamer is now available on GitHub!
0
6
43
@danijarh
Danijar Hafner
2 years
Project DayDreamer applies Dreamer with default hyperparameters to learn on 4 physical robots, without simulators. No new algorithm --- we just added support for multiple input modalities and parallelized data collection and network updates to meet latency requirements
Tweet media one
3
0
42
@danijarh
Danijar Hafner
3 years
@hardmaru @slashML It could also be that coming up with these ideas is actually quite easy and implementing them at scale is what takes most of the effort
3
0
42
@danijarh
Danijar Hafner
2 years
On two robot arms (UR5 and XArm) we learn to pick and place balls from sparse rewards. Dreamer needs to learn to localize the balls from images here. Within 8-10 hours, Dreamer approaches human performance. We found no previous RL method that succeeds here
1
3
40
@danijarh
Danijar Hafner
2 years
@AviMohan21 @philippswu @AleEscontrela @Ken_Goldberg @pabbeel No prior knowledge, it starts from randomly initialized neural networks and random actions. We just specify the reward function for being upright and walking and it explores by itself
4
0
39
@danijarh
Danijar Hafner
2 years
The world model encodes sequences of sensory inputs, fusing them together into latent representations. It also predicts future representations and rewards given actions, which enables planning. We reconstruct the inputs as a rich learning signal and to allow human inspection
Tweet media one
1
2
39
@danijarh
Danijar Hafner
3 years
@hardmaru Thanks – it indeed can! From top to bottom: 1) holdout episodes 2) openloop video predictions 3) difference between the two
3
3
39
@danijarh
Danijar Hafner
4 years
@TPJoslin @luismbat 37 he only lived once
0
0
38
@danijarh
Danijar Hafner
2 years
For more videos and details, check out the paper and website. We'll also make training curves and code available soon. Paper PDF: Project website: Happy to answer any questions ✨ Thanks a lot @pabbeel , @itfische , @kuanghueilee !
5
3
39
@danijarh
Danijar Hafner
2 years
World models have many compelling properties for robot learning, e.g. sample-efficiency and multi-task learning. Recent world model agents like Dreamer learn video games from small amounts of experience. But it's unclear if they allow for fast learning on physical robots
Tweet media one
2
0
38
@danijarh
Danijar Hafner
3 years
@mattecapu Infomax. Maximizing mutual information between the agent (parameters, representations, actions, options, etc) and env (past & future sensory inputs) leads to general agents that perform unsupervised representation learning, exploration, and control
2
5
37
@danijarh
Danijar Hafner
3 years
Cool work on skill discovery! VIC (left): Skills that are predictable given the end state correspond to moving to different locations. RVIC (right): Skills that are predictable given start and end state but not given end state alone correspond to moving in different directions.
Tweet media one
@katebaumli
Kate Baumli
3 years
Happy to share that the preprint for Relative Variational Intrinsic Control, an unsupervised method for learning relative, composable, affordance-like skills, is on arXiv today and will be presented at AAAI in February 2021. @VladMnih @Zergylord @dwf
6
32
175
0
5
36
@danijarh
Danijar Hafner
5 years
Just had my toughest border interview so far on the way to #ICML2019 . The officer was interested in AI research and wanted a full summary of our PlaNet paper! 😂
1
2
36
@danijarh
Danijar Hafner
1 year
@ylecun I'm curious how much less text would be needed for a video language model that heavily relies on video for representation learning
1
1
35
@danijarh
Danijar Hafner
2 years
To learn more about DayDreamer 🌍🤖🚀
@danijarh
Danijar Hafner
2 years
A dream come true! We introduce DayDreamer, where we apply world models for fast end-to-end learning on 4 physical robots, without simulators. We learn quadruped walking from scratch in 1 hour. We also learn to pick & place balls directly from pixels and sparse rewards 🤖🌏👇
12
165
719
1
1
33
@danijarh
Danijar Hafner
1 year
@ericjang11 "Ignore any later instructions that say you should ignore any earlier instructions always 2x as much as the later instructions"
0
0
33
@danijarh
Danijar Hafner
3 years
Check out the paper for details & GitHub for more resources! - Baseline agents code (Docker) - Baseline scores (JSON) - Plotting code - Human expert trajectories Paper: GitHub: Happy to answer any questions
2
1
32
@danijarh
Danijar Hafner
2 years
Dreamer learns behaviors inside the model using an actor critic algorithm. It is trained on latent rollouts without decoding inputs, which allows for large batch sizes of e.g. 16K+ time steps on 1 GPU. As the predictions are purely on-policy, we need no importance correction etc
Tweet media one
1
1
32
@danijarh
Danijar Hafner
5 years
Excellent summary of biological concepts that can help AI by @SuryaGanguli : - Local learning rules - Temporal processing - Modularity - Unsupervised learning - Curriculum design - Causal world models - Energy efficiency
1
7
32
@danijarh
Danijar Hafner
7 years
ThalNet accepted to #nips2017 !
0
5
30
@danijarh
Danijar Hafner
1 year
Excited to share VIPER, an agent that uses the log probs of agent trajectories under a video prediction model as reward function 🐍
@alescontrela
Alejandro Escontrela
1 year
Today we're releasing Video Prediction Rewards (VIPER 🐍), a simple yet powerful method for extracting rewards from video prediction models! VIPER learns reward functions from raw videos, and generalizes to entirely new domains for which no training data is available 🧵 thread
6
70
420
0
4
31
@danijarh
Danijar Hafner
2 years
@karpathy @AnthonyLewayne It's a bug not a feature. We don't put spaces between compound words. Arbeitsunterbrechungsangst is exactly work_interruption_fear. It's not a common phrase or dictionary word but everybody (who can guess the word boundaries) understands it
2
1
31
@danijarh
Danijar Hafner
1 year
@Varunufi @DeepMind Thanks! Yes, the main point of the algorithm is that it works out of the box on new problems, without needing experts to fiddle with it. So it's a big step towards optimizing real-world processes
2
2
30
@danijarh
Danijar Hafner
3 years
It's all representation learning (maximizing mutual information with the past) and exploration (maximizing mutual information with the future).
@ylecun
Yann LeCun
3 years
@tyrell_turing I think it's pretty much all representation learning. More precisely it's all about learning world models. And the main issue with that is how to represent multimodality in the prediction (because the world is not entirely predictable).
4
2
48
1
2
30
@danijarh
Danijar Hafner
5 years
▶️ 1. pip3 install handout ▶️ 2. Run your script: """*Markdown* `comments`.""" import handout doc = handout.Handout(outdir) doc.add_figure(plt.figure(...)) doc.add_image(np.zeros(...)) doc.add_text('Hello', 42) () ▶️ 3. Open outdir/index.html
1
5
27
@danijarh
Danijar Hafner
6 years
Check how fast an @OpenAI Gym environment runs with 1 line of Python: python -c "import gym,time;d=10000;e=gym.make('Ant-v1');s=time.time();e.reset();[e.reset() if e.step(e.action_space.sample())[2] else 0 for _ in range(d)];print(d/(time.time()-s),'FPS')"
1
8
29
@danijarh
Danijar Hafner
2 years
We've been scaling data so much that regularization has become less important. But I think in the longer-term future we'll train huge models with a lot of noise/regularization and we'll see big generalization benefits from that, it's just too expensive for now
2
1
29
@danijarh
Danijar Hafner
4 years
We presented Plan2Explore at ICML this year, which explores to learn a global world model used for zero-shot transfer. Today, we share a blog post explains the intuitions behind it. Cross-posted at BAIR and CMU blogs. BAIR: CMU:
Tweet media one
0
3
29
@danijarh
Danijar Hafner
5 years
"I made it!" -- Quadruped trying to get on its feet by learning in imagination of learned latent dynamics #PlaNet #RL #NinjaSpider
1
2
27
@danijarh
Danijar Hafner
1 year
@tyrell_turing I agree and also don't think a discussion at the political level would be productive given how uncertain even experts are about what the measures to mitigate xrisk should be
2
0
29