Siqiao Huang
@KnightNemo_
Followers
570
Following
240
Media
30
Statuses
142
Junior undergrad, Yao class @Tsinghua_Uni . Current intern @mldcmu. Interested in ML & Robotics. World Models / VLAs / Humanoid Foundation Models.
Joined August 2024
papers are kind of like movies: the first one is usually the best, and the sequels tend to get more complicated but not really more exciting. But that totally doesn’t apply to the DepthAnything series. @bingyikang's team somehow keeps making things simpler and more scalable each
After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3
5
29
398
When Dreamerv4 came out, the two takeaways for me are: 1. Diffusion Forcing / Streaming Video Gen techniques will be the mainstream algorithm choice in WMs 2. The gap between Video Generation Models and World Models is becoming increasingly small. If we have a good enough Video
arxiv.org
World models, which predict future transitions from past observation and action sequences, have shown great promise for improving data efficiency in sequential decision-making. However, existing...
Excited for this podcast episode with TalkRL to be out! 🎙️ We talk about the story behind Dreamer 4, the details of scalable world models, and the future of robotics (and beyond) 🤖🌏🚀 Thanks for the fun conversation, @TalkRLPodcast
2
8
83
“The Limits of My World means the Limits of My Language” —— Siqiao Huang, Nov. 2025.
“The philosopher Wittgenstein once wrote that “the limits of my language mean the limits of my world.” I’m not a philosopher. But I know at least for AI, there is more than just words. Spatial intelligence represents the frontier beyond language—the capability that links
0
0
17
There are some projects that are cool, some that are significant. But every once in a while, something like this comes across— and I just lean back in my chair and think, “Damn.” Congrats @li_yitang on this amazing project!!!
Meet BFM-Zero: A Promptable Humanoid Behavioral Foundation Model w/ Unsupervised RL👉 https://t.co/3VdyRWgOqb 🧩ONE latent space for ALL tasks ⚡Zero-shot goal reaching, tracking, and reward optimization (any reward at test time), from ONE policy 🤖Natural recovery & transition
2
5
23
Today is my last day at @GoogleDeepMind. After almost exactly 10 years at Google including 12 internships and the last 2 1/2 years full time, it really feels like a chapter coming to an end. I'm grateful for all the experiences and friends I've made at Google and DeepMind. I
146
51
2K
Thanks @EmbodiedAIRead @yilun_chen_ for featuring our repo!!!
Awesome World Models Github: https://t.co/IBANoRMmIA Newly released one-stop github repo on everything about World Modeling, spanning definition, theory, general approaches, use cases and evaluations in Embodied AI (as well as in other domains like NLP, Agent, etc). Organized
0
0
17
@JinWeiyang18434 Btw, I really liked the picture that Nano-Banana🍌 @GeminiApp generated🤣. It integrates the elements seamlessly, generative models nowadays are just super wild. From Left to Right: - Genie3 blogpost picture @jparkerholder @shlomifruchter - @ylecun 's renowned brain picture -
0
1
8
This repo covers key papers and research on World Models across multiple domains, including Embodied AI, Autonomous Driving, NLP, and more. If you find it useful, please give it a star ⭐! PRs are always welcome. 🔗: https://t.co/DDG1wU9WRB Shoutout to @JinWeiyang18434 , and we
github.com
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling. - knightnemo/Awesome-World-Mo...
1
2
21
Introducing🌍 Awesome-World-Models, a one-stop github repo of everything there is to know about world models! Here is a new, curated one-stop resource list for everyone interested in "World Models," aiming to be a go-to guide for researchers and developers in the field. 🧵(1/n)
15
103
664
Thanks for sharing! 😎
https://t.co/bV7KMwCTvx Credit to @KnightNemo_ the WM reading list worth to checking!
0
0
2
Tagging a few people who may be interested: @huskydogewoof @_amirbar @Haoyu_Xiong_ @K_Sta8is @CSProfKGD @TongheZhang01 @li_yitang @leoliuym @xiao_ted @DrJimFan @ChongZitaZhang @chuning_zhu @liuziwei7 @ElijahGalahad @xxunhuang
1
0
7
Personally, I'm more interested in latent WMs. But since nobody is mentioning it, here are why pixel space also makes sense: 1. One's purpose determines one's standpoint. For sequential decision making, pixel space makes no sense; but for game simulation, pixel is everything. 2.
On world model / egocentric visual dynamics model, also on building robotic simulation, also on building robotic genAI models: Being visually realistic doesn't mean being physically accurate and semantically correct.
6
2
87
Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only
12
69
404
World Models in Game Simulations are cool. But the real challenge is using it to advance robotics. This comes in two folds: 1. As a source of data for policy training 2. As a verifier for tts and policy evaluation Glad to see both aspects coming into play in this awesome work.
Rollouts in the real world are slow and expensive. What if we could rollout trajectories entirely inside a world model (WM)? Introducing 🚀Ctrl-World🚀, a generative manipulation WM that can interact with advanced VLA policy in imagination. 🧵1/6
0
0
15
Not sure if now is the best time to do world model research, but it surely is good times for making world model memes🤣
11
40
376
🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)
2
39
133
Excited to announce our NeurIPS ’25 tutorial: Foundations of Imitation Learning: From Language Modeling to Continuous Control With Adam Block & Max Simchowitz (@max_simchowitz)
6
50
359
To wrap up — world models are evolving fast, but they’re not the next LLMs. The real gold lies in video generation, generalist policies and integration of sensorimotor and abstraction. The full blog😎: 👉 https://t.co/yXRQ08iapW Would love to hear your takes — hype, hope, or
0
0
4
🗺️How About JEPA-Style World Models? LeCun’s JEPA may not be the final form of world models, but its latent-space learning idea is gold. Most modern video diffusion models already operate in latent space — using near-lossless VAEs as encoders. Future world models could co-train
1
0
2
🍫Physics vs Data: The Bitter Lesson Simulator = prior-driven. World model = data-driven. Given enough data, data-driven wins — always. But adding priors still boosts performance in narrow domains. It’s the classic tradeoff: generalization vs performance. So “physics-informed
1
0
3