Wenlong Huang
@wenlong_huang
Followers
4K
Following
4K
Media
33
Statuses
593
PhD Student @StanfordSVL @StanfordAILab. Previously @Berkeley_AI @GoogleDeepMind @NVIDIARobotics. Robotics, Foundation Models.
Stanford, CA
Joined May 2019
What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
18
106
516
🎉 How do we measure the rapid progress of robotic learning and embodied AI research? The 1st BEHAVIOR challenge results are out! And we're to see such strong performance on 50 challenging household tasks. Congrats to the winning teams! 🥇Robot Learning Collective 🥈Comet
24
29
262
Come to see us at NeurIPS Foundation Models Meet Embodied Agents: BEHAVIOR & EAI Challenge! Sun Dec 07 11:00 - 13:45 PST, 2025 Mezzanine Room 15AB https://t.co/QlmvvE6rvJ • 11:00 – 11:10 AM — Introduction and welcome (Fei-Fei Li) • 11:10 – 11:15 AM — BEHAVIOR Challenge Intro
foundation-models-meet-embodied-agents.github.io
1
7
26
@yingke_wang18 is also applying to PhD programs this cycle! She has done amazing works during her time at @StanfordSVL . She will be a fantastic addition to any lab in robot learning!
1
0
6
Very exciting to see how a world model, learned from only play data, enables generalization to new goals (in this case, beautiful artworks that robots have never seen!). Big congrats to @yingke_wang18 @RuohanZhang76 on this very cool work combining manipulation and art!
1/N 🎨🤖Given only a static image of an oil painting by an expert artist, can a robot infer the corresponding control actions, such as trajectory, orientation, and applied force, to accurately reproduce the painting? 🖌️Introducing IMPASTO: a robotic oil-painting system that
2
5
47
Generative models (diffusion/flow) are taking over robotics 🤖. But do we really need to model the full action distribution to control a robot? We suspected the success of Generative Control Policies (GCPs) might be "Much Ado About Noising." We rigorously tested the myths. 🧵👇
12
78
483
(1/5) Open Publication and the Prisoners dilemma. The prisoner’s dilemma game can offer us much insight into the publication strategy of leading industrial AI research labs. Suppose there are two labs A and B. If A and B both publish they are both better off than if they both
11
54
342
This is a really smart setup for evaluating forward and inverse world modeling with VLMs💡— congrats on the paper! I also really appreciate the deep dive into Cosmos-Reason1. Lots of insightful details to learn from 📖
Most VLM benchmarks watch the world; few ask how actions *change* it from a robot's eye. Embodied cognition tells us that intelligence isn't just watching – it's enacted through interaction. 👉We introduce ENACT: A benchmark that tests if VLMs can track the evolution of a
0
4
8
I will join UChicago CS @UChicagoCS as an Assistant Professor in late 2026, and I’m recruiting PhD students in this cycle (2025 - 2026). My research focuses on AI & Robotics - including dexterous manipulation, humanoids, tactile sensing, learning from human videos, robot
25
98
633
Come and work with robots and the smartest students @StanfordSVL! We are hiring a software developer, focusing on simulation for robotics & robotic learning. You'll be working directly with @drfeifei and @jiajunwu_cs and our amazing students and researchers. Huge thanks to our
linkedin.com
Posted 6:43:32 PM. Note: This is a 1 year fixed term position. Visa sponsorship is not available for this position…See this and similar jobs on LinkedIn.
3
15
112
Having mostly been a consumer of VLMs in robotics, a long-term goal is to systematically evaluate these models. Not only on their abilities to see, like how they are trained, but also on their abilities to connect *seeing* to *doing*. As (part of) the robot brain, their reach
Most VLM benchmarks watch the world; few ask how actions *change* it from a robot's eye. Embodied cognition tells us that intelligence isn't just watching – it's enacted through interaction. 👉We introduce ENACT: A benchmark that tests if VLMs can track the evolution of a
0
7
60
Our most recent work that benchmarks modern VLM and their efficacy for long horizon household activities in robotic learning, using BEHAVIOR benchmark environment.👇
Most VLM benchmarks watch the world; few ask how actions *change* it from a robot's eye. Embodied cognition tells us that intelligence isn't just watching – it's enacted through interaction. 👉We introduce ENACT: A benchmark that tests if VLMs can track the evolution of a
32
65
456
We are looking for PhDs and Postdocs! So proud of my students on achieving so many amazing things during their "very first year". I have been asked many times how I like being faculty, especially with funding cuts. My answer is always "it is the prefect job for me"! Still
33
152
1K
Most VLM benchmarks watch the world; few ask how actions *change* it from a robot's eye. Embodied cognition tells us that intelligence isn't just watching – it's enacted through interaction. 👉We introduce ENACT: A benchmark that tests if VLMs can track the evolution of a
7
56
236
Today we're announcing SAM 3D, a foundation model for visually grounded 3D reconstruction. Super excited to share what my team has been working on! Try it here: https://t.co/aKlYajRGta Blog: https://t.co/ljcfqjRCP5 Paper: https://t.co/6huglEiNqV Code:
github.com
SAM 3D Objects. Contribute to facebookresearch/sam-3d-objects development by creating an account on GitHub.
Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: https://t.co/tIwymSSD89 2️⃣ SAM 3D
11
37
335
Today, we present a step-change in robotic AI @sundayrobotics. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->
426
662
5K
Spatial intelligence has long been one of the biggest bottleneck for VLMs. Two years ago in Sept 2023, when I just started my postdoc, I still remember vividly how we are excited about GPT-4V and how our “What GPT-4V still can’t do” slides were completely dominated by geometric
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on
14
126
678
We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach
openai.com
We trained models to think in simpler, more traceable steps—so we can better understand how they work.
227
711
6K
“Creativity is intelligence having fun.” Unleash your creativity and imagination with Marble - our 3D world generation model, now available to everyone!
Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9
102
218
2K
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on
181
647
3K