
Jim Fan
@DrJimFan
Followers
329K
Following
9K
Media
836
Statuses
4K
NVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
Views my own. Contact →
Joined December 2012
I've been a bit quiet on X recently. The past year has been a transformational experience. Grok-4 and Kimi K2 are awesome, but the world of robotics is a wondrous wild west. It feels like NLP in 2018 when GPT-1 was published, along with BERT and a thousand other flowers that
182
328
4K
Go check out @yukez’s talk at CoRL! Project GR00T is cooking 🍳
The rise of humanoid platforms presents new opportunities and unique challenges. 🤖 Join @yukez at #CoRL2025 as he shares the latest research on robot foundation models and presents new updates with the #NVIDIAIsaac GR00T platform. Learn more 👉 https://t.co/LrzONs1Gzc
2
87
189
GTC is on again at DC! I will be hand picking one golden ticket winner for a complimentary pass, special seating for Jensen's keynote, NV swags, and other perks! Reply with your coolest open-source project on GR00T N1/N1.5/Dream models!
5
7
63
There was something deeply satisfying about ImageNet. It had a well curated training set. A clearly defined testing protocol. A competition that rallied the best researchers. And a leaderboard that spawned ResNets and ViTs, and ultimately changed the field for good. Then NLP
(1/N) How close are we to enabling robots to solve the long-horizon, complex tasks that matter in everyday life? 🚨 We are thrilled to invite you to join the 1st BEHAVIOR Challenge @NeurIPS 2025, submission deadline: 11/15. 🏆 Prizes: 🥇 $1,000 🥈 $500 🥉 $300
33
329
1K
Vibe Minecraft: a multi-player, self-consistent, real-time world model that allows building anything and conjuring any objects. The function of tools and even the game mechanics itself can be programmed by natural language, such as "chrono-pickaxe: revert any block to a previous
109
123
1K
Would love to see the FSD Scaling Law, as it’s the only physical data flywheel at planetary scale. What’s the “emergent ability threshold” for model/data size?
Tesla is training a new FSD model with ~10X params and a big improvement to video compression loss. Probably ready for public release end of next month if testing goes well.
21
74
536
This may be a testament to the “Reasoning Core Hypothesis” - reasoning itself only needs a minimal level of linguistic competency, instead of giant knowledge bases in 100Bs of MoE parameters. It also plays well with Andrej’s LLM OS - a processor that’s as lightweight and fast as
🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready! 🔹 Instruct: Boosted general skills, multilingual coverage, and long-context instruction following. 🔹 Thinking: Advanced reasoning in logic, math, science & code — built for
30
78
447
World modeling for robotics is incredibly hard because (1) control of humanoid robots & 5-finger hands is wayyy harder than ⬆️⬅️⬇️➡️ in games (Genie 3); and (2) object interaction is much more diverse than FSD, which needs to *avoid* coming into contact. Our GR00T Dreams work was
What if robots could dream inside a video generative model? Introducing DreamGen, a new engine that scales up robot learning not with fleets of human operators, but with digital dreams in pixels. DreamGen produces massive volumes of neural trajectories - photorealistic robot
36
171
1K
Evaluation is the hardest problem for physical AI systems: do you crash test cars every time you debug a new FSD build? Traditional game engine (sim 1.0) is an alternative, but it's not possible to hard-code all edge cases. A neural net-based sim 2.0 is purely programmed by data,
@DrJimFan Tesla has had this for a few years. Used for creating unusual training examples (eg near head-on collisions), where even 8 million vehicles in the field need supplemental data, especially as our cars get safer and dangerous situations become very rare.
15
80
631
This is game engine 2.0. Some day, all the complexity of UE5 will be absorbed by a data-driven blob of attention weights. Those weights take as input game controller commands and directly animate a spacetime chunk of pixels. Agrim and I were close friends and coauthors back at
Introducing Genie 3, our state-of-the-art world model that generates interactive worlds from text, enabling real-time interaction at 24 fps with minutes-long consistency at 720p. 🧵👇
80
216
2K
No em dash should be baked into pretraining, post-training, alignment, system prompt, and every nook and cranny in an LLM’s lifecycle. It needs to be hardwired into the kernel, identity, and very being of the model.
169
149
1K
Shengjia is one of the brightest, humblest, and most passionate scientists I know. We went to PhD together for 5 yrs, sitting across the hall at Stanford Gates building. Good old times. I didn’t expect this, but not at all surprised either. Very bullish on MSL!
We're excited to have @shengjia_zhao at the helm as Chief Scientist of Meta Superintelligence Labs. Big things are coming! 🚀 See Mark's post: https://t.co/SL7h4sGfwx
32
223
2K
I'm observing a mini Moravec's paradox within robotics: gymnastics that are difficult for humans are much easier for robots than "unsexy" tasks like cooking, cleaning, and assembling. It leads to a cognitive dissonance for people outside the field, "so, robots can parkour &
143
632
3K
My bar for AGI is far simpler: an AI cooking a nice dinner at anyone’s house for any cuisine. The Physical Turing Test is very likely harder than the Nobel Prize. Moravec’s paradox will continue to haunt us, looming larger and darker, for the decade to come.
137
229
2K
Attending CVPR at Nashville! Email or DM me. I’ll float around the venue like brownian motion. Recruiting!
14
79
201
DreamGen was featured in Jensen's Computex Keynote as the new GR00T Dreams workflow: https://t.co/SyATAV1hGu
1
6
50
Also check out the deep dive thread from @jang_yoel for more technical details! https://t.co/4VsMrVkmmR
Introducing 𝐃𝐫𝐞𝐚𝐦𝐆𝐞𝐧! We got humanoid robots to perform totally new 𝑣𝑒𝑟𝑏𝑠 in new environments through video world models. We believe video world models will solve the data problem in robotics. Bringing the paradigm of scaling human hours to GPU hours. Quick 🧵
3
9
56
Paper: DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories. https://t.co/IbaFHRZbF0 Project website with lots of videos: https://t.co/tByNeNMzzn This is a large team effort led by NVIDIA GEAR Lab and involves researchers from 8 other institutes!
arxiv.org
We introduce DreamGen, a simple yet highly effective 4-stage pipeline for training robot policies that generalize across behaviors and environments through neural trajectories - synthetic robot...
1
9
65
What if robots could dream inside a video generative model? Introducing DreamGen, a new engine that scales up robot learning not with fleets of human operators, but with digital dreams in pixels. DreamGen produces massive volumes of neural trajectories - photorealistic robot
58
142
911