
Joel Jang
@jang_yoel
Followers
2K
Following
2K
Media
50
Statuses
369
Senior Research Scientist @nvidiaai GEAR Lab, world modeling lead. On leave from PhD at @uwcse
Seattle, US
Joined March 2021
Introducing ππ«πππ¦πππ§! We got humanoid robots to perform totally new π£ππππ in new environments through video world models. We believe video world models will solve the data problem in robotics. Bringing the paradigm of scaling human hours to GPU hours. Quick π§΅
9
78
382
The rise of humanoid platforms presents new opportunities and unique challenges. π€ Join @yukez at #CoRL2025 as he shares the latest research on robot foundation models and presents new updates with the #NVIDIAIsaac GR00T platform. Learn more π https://t.co/LrzONs1Gzc
49
29
157
Full episode dropping soon! Geeking out with @jang_yoel on DreamGen - Unlocking Generalization in Robot Learning through Video World Models https://t.co/4GkmxHMqSW Co-hosted by @chris_j_paxton @micoolcho
2
7
31
World modeling for robotics is incredibly hard because (1) control of humanoid robots & 5-finger hands is wayyy harder than β¬οΈβ¬
οΈβ¬οΈβ‘οΈ in games (Genie 3); and (2) object interaction is much more diverse than FSD, which needs to *avoid* coming into contact. Our GR00T Dreams work was
What if robots could dream inside a video generative model? Introducing DreamGen, a new engine that scales up robot learning not with fleets of human operators, but with digital dreams in pixels. DreamGen produces massive volumes of neural trajectories - photorealistic robot
36
171
1K
A humanoid robot policy trained solely on synthetic data generated by a world model. Research Scientist Joel Jang presents NVIDIA's DreamGen pipeline: β¦Ώ Post-train the world model Cosmos-Predict2 with a small set of real teleoperation demos. β¦Ώ Prompt the world model to
10
43
221
I've been a bit quiet on X recently. The past year has been a transformational experience. Grok-4 and Kimi K2 are awesome, but the world of robotics is a wondrous wild west. It feels like NLP in 2018 when GPT-1 was published, along with BERT and a thousand other flowers that
183
328
4K
Check out Cosmos-Predict2, a new SOTA video world model trained specifically for Physical AI (powering GR00T Dreams & DreamGen)!
We build Cosmos-Predict2 as a world foundation model for Physical AI builders β fully open and adaptable. Post-train it for specialized tasks or different output types. Available in multiple sizes, resolutions, and frame rates. π· Watch the repo walkthrough
0
6
44
π GR00T Dreams code is live! NVIDIA GEAR Lab's open-source solution for robotics data via video world models. Fine-tune on any robot, generate 'dreams', extract actions with IDM, and train visuomotor policies with LeRobot datasets (GR00T N1.5, SmolVLA). https://t.co/7Fndn7zDJB
github.com
Nvidia GEAR Lab's initiative to solve the robotics data problem using world models - NVIDIA/GR00T-Dreams
Introducing ππ«πππ¦πππ§! We got humanoid robots to perform totally new π£ππππ in new environments through video world models. We believe video world models will solve the data problem in robotics. Bringing the paradigm of scaling human hours to GPU hours. Quick π§΅
6
44
151
How we improve VLA generalization? π€ Last week we upgraded #NVIDIA GR00T N1.5 with minor VLM tweaks, FLARE, and richer data mixtures (DreamGen, etc.) β¨. N1.5 yields better language following β post-trained on unseen Unitree G1 with 1K trajectories, it follows commands on
2
23
187
π Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamicsβoutperforming popular open-source video foundation models. Itβs openly
7
62
204
Assuming that we need ~2 trillion tokens to get to a robot GPT, how can we get there? I went through a few scenarios looking at how we can combine simulation data, human video data, and looking at the size of existing robot fleets. Some assumptions: - We probably need some real
12
35
214
π₯ ReAgent-V Released! π₯ A unified video framework with reflection and reward-driven optimization. β¨ Real-time self-correction. β¨ Triple-view reflection. β¨ Auto-selects high-reward samples for training.
1
18
44
Giving a talk about GR00T N1, GR00T N1.5, and GR00T Dreams in NVIDIA GTC Paris 06.11 2PM - 2:45PM CEST. If you are at Vivatech in Paris, please stop by the "An Introduction to Humanoid Robotics" Session!
Are you curious about #humanoidrobotics? Join our experts at #GTCParis for a deep dive into the #NVIDIAIsaac GR00T platform and its four pillars: π§ Robot foundation models for cognition and control π Simulation frameworks built on @nvidiaomniverse and #NVIDIACosmos π Data
1
6
63
Representation also matters for VLA models! Introducing FLARE: Robot Learning with Implicit World Modeling. With future latent alignment objective, FLARE significantly improves policy performance on multitask imitation learning & unlocks learning from egocentric human videos.
6
24
118
Nvidia also announced DreamGen, a new engine that scales robot learning with digital dreams It produces large volumes of photorealistic robot videos (using video models) paired with motor action labels and unlocks generalization to new environments https://t.co/rWTboFmM7z
3
7
94
NVIDIA has published a paper on DREAMGEN β a powerful 4-step pipeline for generating synthetic data for humanoids that enables task and environment generalization. - Step 1: Fine-tune a video generation model using a small number of human teleoperation videos - Step 2: Prompt
2
32
158
Itβs not a matter of if, itβs a matter of when, video models and world models are going to be a central tool for building robot foundation models.
Introducing ππ«πππ¦πππ§! We got humanoid robots to perform totally new π£ππππ in new environments through video world models. We believe video world models will solve the data problem in robotics. Bringing the paradigm of scaling human hours to GPU hours. Quick π§΅
0
1
11