anandmaj
@Almondgodd
Followers
2K
Following
22K
Media
30
Statuses
256
path of childhood's end | gap @penn | prev ai @tesla_optimus @dynarobotics
sf
Joined February 2019
I spent the past month reimplementing DeepMind’s Genie 3 world model from scratch Ended up making TinyWorlds, a 3M parameter world model capable of generating playable game environments demo below + everything I learned in thread (full repo at the end)👇🏼
97
269
2K
When we prepped for swe jobs, we never leetcoded alone. Every session was a mock interview. That’s how we learned to think aloud and get offers. Grinding leetcode doesn’t teach you how to talk through code. That’s why we built Leo, your 24/7 mock coding interviewer that talks
3
2
5
what's a good nootropic for someone just getting into nootropics? (nicotine gum, racetams, L-theanine, something else worth trying?)
5
0
11
1/ The future of general-purpose robotics will be decided by one major question: which flavor of data scales reasoning? Every major lab represents a different bet. Over the past 3 months, @adam_patni, @vriishin, and I read the core research papers, spoke with staff at the major
61
192
769
Scaling Era finally came and it’s a work of art with so many gems
0
0
14
9/ Finally, here’s tinyworlds: https://t.co/Zdf1j6y2FR It’s a minimal codebase to help people understand world modeling. Try it out yourself and make a PR, there are many easy + impactful additions to make (such as Mixture of Experts, Muon, and scaling). Thank you to @runpod_io
github.com
A minimal implementation of DeepMind's Genie world model - AlmondGod/tinyworlds
4
9
172
8/ training the world generator Lastly, I trained the dynamics model that predicts the next frame. > In training, it predicts masked tokens > In inference, we add masked frames and it autoregressively decodes them. When I first trained the dynamics model, loss plateaud early
1
0
51
7/ training the action tokenizer The action tokenizer is the model that creates action labels and allows us to train on unlabeled video. > From raw video, it predicts the action that happened between two frames. > This lets dynamics learn to listen to actions without actually
2
0
44
6/ training the video tokenizer The first module, the video tokenizer, compresses videos into tokens using: > Convolutions to transform images into vectors representing each section of the image > ST transformer to let each vector share information > FS Quantization to turn the
1
1
48
5/ quantizing video into tokens For the video and action tokenizers, we need a quantization method to produce tokens. Tokenizers represent videos as compressed data (like zip files). They learn by finding a set of small building blocks that makes reconstructing the video easy.
1
0
54
4/ designing the architecture Next, I adapted Genie's high-level architecture to TinyWorlds. I considered using either: > Diffusion: where we start with noise and slowly remove it until we have a completed video sequence. > Autoregression: where we predict small chunks of video
1
0
57
3/ building the space-time transformer Normal transformers in LLMs understand language, which is 1D. TinyWorlds requires a model that understands video, which is 3D (height, width, time). This model also has to train quickly and learn using both actions and video. Space-time
3
0
79
2/ building the dataset of worlds Before training TinyWorlds, I decided what video game worlds my model should generate by building the dataset. The set of worlds the model sees in training determines what worlds it generates. I created TinyWorlds' dataset by processing
1
0
65
1/ understanding world models World models are neural networks that simulate physical worlds by generating videos. DeepMind’s Genie 3 proved that, just like LLMs, scaled-up world models exhibit emergent behavior: > Controllability: Pressing the right arrow makes the camera pan
1
5
90
The word clanker and its consequences have been a disaster for the robot race
1
0
16
Thank you so much to my incredible mentors @julianibarz @ashishkr9311
@kamalgupta09 @Yi__Li and many more.
1
0
19
Over the past 3 months I’ve been interning @tesla_optimus to build AGI for the real world. Robotics is the hardest frontier of AI, but it gives us a clear path to eliminating scarcity. After working on Optimus, I’ve never been more confident that universal abundance is within
32
8
368
I bet people in the future will do caveman to agi any% speed runs for fun
0
0
19