Zihan Wang ✈️ NeurIPS
@wzihanw
Followers
23K
Following
4K
Media
99
Statuses
900
PhD student @NorthwesternU & student researcher @Microsoft. Ex @yutori_ai @deepseek_ai @uiuc_nlp @RUC. I work on Reasoning Agent / RL / efficiency.
Joined March 2022
Why does your RL training always collapse? In our new paper of RAGEN, we explore what breaks when you train LLM *Agents* with multi-turn reinforcement learning—and possibly how to fix it. 📄 https://t.co/z0U0612HWT 🌐 https://t.co/4DUfaees48 1/🧵👇
8
90
438
I'll be at #NeurIPS2025 next week presenting ViGoRL! If you're interested in visual reasoning, RL, or agents, let's schedule a chat.
How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵
2
5
30
VAGEN poster 𝐭𝐨𝐦𝐨𝐫𝐫𝐨𝐰 at #NeurIPS! 🎮🧠 - 🕚 11am–2pm Wed - 📍 Exhibit Hall C,D,E #5502 We had much fun exploring: • How 𝐰𝐨𝐫𝐥𝐝 𝐦𝐨𝐝𝐞𝐥𝐢𝐧𝐠 helps VLM RL agents learn better policies • 𝐌𝐮𝐥𝐭𝐢-𝐭𝐮𝐫𝐧 𝐏𝐏𝐎 credit assignment via 𝐭𝐰𝐨-𝐥𝐞𝐯𝐞𝐥
🚀Excited to share our NeurIPS 2025 paper VAGEN, a scalable RL framework that trains VLM agents to reason as world models. VLM agents often act without tracking the world: they lose state, fail to anticipate effects, and RL wobbles under sparse, late rewards. Our solution is
0
9
58
VAGEN poster 𝐭𝐨𝐦𝐨𝐫𝐫𝐨𝐰 at #NeurIPS! 🎮🧠 - 🕚 11am–2pm Wed - 📍 Exhibit Hall C,D,E #5502 We had much fun exploring: • How 𝐰𝐨𝐫𝐥𝐝 𝐦𝐨𝐝𝐞𝐥𝐢𝐧𝐠 helps VLM RL agents learn better policies • 𝐌𝐮𝐥𝐭𝐢-𝐭𝐮𝐫𝐧 𝐏𝐏𝐎 credit assignment via 𝐭𝐰𝐨-𝐥𝐞𝐯𝐞𝐥
🚀Excited to share our NeurIPS 2025 paper VAGEN, a scalable RL framework that trains VLM agents to reason as world models. VLM agents often act without tracking the world: they lose state, fail to anticipate effects, and RL wobbles under sparse, late rewards. Our solution is
0
9
58
I’m thrilled to announce that I’m launching a new startup dedicated to patient-centric AI for drug discovery, and we’re hiring Founding AI Engineers who are passionate about advancing healthcare through cutting-edge AI. Apply here by Jan 10:
2
32
344
I gradually realized just how lucky I was to have an advisor with such first reaction Let us protect our community together
5
15
351
Congrats!
After three years at CRV, I am stepping onto Striker Venture Partners' founding team, leading the firm's AI investments. Thanks to @BusinessInsider for covering the move.
1
0
8
Most VLM benchmarks watch the world; few ask how actions *change* it from a robot's eye. Embodied cognition tells us that intelligence isn't just watching – it's enacted through interaction. 👉We introduce ENACT: A benchmark that tests if VLMs can track the evolution of a
7
56
235
Couldn’t agree more — being part of a lab where “aliveness” is protected is rare and precious. Grateful to grow in an environment where curiosity, weirdness, and ambitious ideas are actually encouraged. Excited for year 2 at MLL🎉🥳
We are looking for PhDs and Postdocs! So proud of my students on achieving so many amazing things during their "very first year". I have been asked many times how I like being faculty, especially with funding cuts. My answer is always "it is the prefect job for me"! Still
1
5
91
Introducing Yutori Navigator 31 years ago, the modern web era began with Netscape Navigator. Today, we’re introducing Yutori Navigator — a web agent that autonomously navigates websites on its own cloud browser to complete tasks for you. Navigator achieves pareto-domination
28
47
245
While discussing spatial intelligence of "VLMs", wanted to share an interesting finding we have in ICML25 paper: We actually opens the black box of why VLMs fail at even the simplest spatial question "where is A to B" - 90% of tokens are visual, yet they get only ~10% of the
🚀🔥 Thrilled to announce our ICML25 paper: "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas"! We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. 🌍🔍 Paper:
10
87
580
Spatial intelligence has long been one of the biggest bottleneck for VLMs. Two years ago in Sept 2023, when I just started my postdoc, I still remember vividly how we are excited about GPT-4V and how our “What GPT-4V still can’t do” slides were completely dominated by geometric
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on
14
126
677
🚀Introducing Lumine, a generalist AI agent trained within Genshin Impact that can perceive, reason, and act in real time, completing hours-long missions and following diverse instructions within complex 3D open-world environments.🎮 Website: https://t.co/UxSwNKGZml 1/6
32
153
908
Intelligence will inevitably evolve from language to the physical world, unlocking spatial intelligence for multi-modal perception, reasoning, generation, and action—essential for true AGI. I'm working on building this at @XiaomiMiMo, spearheading a creative and talented team!
23
18
250
🔥Our #NeurIPS challenge on Foundation Models meet Embodied Agents released the final eval for “Embodied Agent Interface". 🚀Come test your LLMs for Embodied Agent tasks! ⚒️We've newly annotated ~5000 data points for: - Goal Interpretation - Subgoal Decomposition - Action
2
9
25
Introducing Nested Learning: A new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing. Our proof-of-concept model, Hope, shows improved performance in language modeling. Learn more: https://t.co/fpdDlYaleL
142
816
5K
I will join Northwestern University Computer Science as an Assistant Professor in Fall 2026! I am actively recruiting PhD students and seeking collaborations in robotics, human-robot interaction, brain-computer interfaces, cognitive science, societal impact of AI & automation,
75
210
2K
🔥The deadline (Nov 3, 2025 AoE) for 𝐍𝐞𝐮𝐫𝐈𝐏𝐒 𝟐𝟎𝟐𝟓 𝐖𝐨𝐫𝐤𝐬𝐡𝐨𝐩 𝐨𝐧 𝐒𝐨𝐜𝐢𝐚𝐥𝐥𝐲 𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐥𝐞 𝐚𝐧𝐝 𝐓𝐫𝐮𝐬𝐭𝐰𝐨𝐫𝐭𝐡𝐲 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐌𝐨𝐝𝐞𝐥𝐬 (𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐥𝐞𝐅𝐌) is approaching!🔥 📍 Hybrid (Hilton Mexico City Reforma +
0
15
40