Shiqi Chen
@shiqi_chen17
Followers
506
Following
228
Media
18
Statuses
75
PhD student @CityUHongKong. NLPer. Visiting PhD @OxCSML @NorthwesternU and @HKUST. Former @SeaAIL.
Hong Kong
Joined March 2023
🚀🔥 Thrilled to announce our ICML25 paper: "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas"! We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. 🌍🔍 Paper:
5
40
297
Manling is an incredibly encouraging advisor. She always lights me up when my projects feel stuck and makes me truly believe “I can do it.” She’s the one who turns roadblocks into possibilities. Apply to her group!
We are looking for PhDs and Postdocs! So proud of my students on achieving so many amazing things during their "very first year". I have been asked many times how I like being faculty, especially with funding cuts. My answer is always "it is the prefect job for me"! Still
1
1
38
We’ve been working on this project for nine months. It started from a very simple belief: if we call it attention, it should roughly tell us where the model is “looking.” As we dug deeper into spatial reasoning, the attention patterns kept revealing subtle and surprising
While discussing spatial intelligence of "VLMs", wanted to share an interesting finding we have in ICML25 paper: We actually opens the black box of why VLMs fail at even the simplest spatial question "where is A to B" - 90% of tokens are visual, yet they get only ~10% of the
0
4
15
Spatial intelligence has long been one of the biggest bottleneck for VLMs. Two years ago in Sept 2023, when I just started my postdoc, I still remember vividly how we are excited about GPT-4V and how our “What GPT-4V still can’t do” slides were completely dominated by geometric
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on
14
126
670
🚀We are excited to introduce the Tool Decathlon (Toolathlon), a benchmark for language agents on diverse, complex, and realistic tool use. ⭐️32 applications and 600+ tools based on real-world software environments ⭐️Execution-based, reliable evaluation ⭐️Realistic, covering
6
27
164
Thanks @wzihanw! Couldn’t have shipped this without your practical walkthroughs of the RAGEN internals. Development was much smoother because of you!
Learning a world model is critical for LLMs to succeed in an environment they have never seen before. Check out this great work led by the amazing @shiqi_chen17!
1
0
6
Great collaboration with my amazing advisors @ManlingLi_ @junxian_he @yeewhye Siyang Gao, and my supportive collaborators @tongyao_zhu @Aydenwza @jinghan23 @TengX6 @James_KKW! Code: https://t.co/yxnXxPuxdL Paper: https://t.co/O7okHhIZKB (8/8)
github.com
Github repository for "Internalizing World Models via Self-Play Finetuning for Agentic RL" - shiqichen17/SPA
1
3
16
Takeaway: No stronger models. No ground truth actions. No tools. A single LLM agent can enter an OOD environment, internalize its dynamics, and learn to win—just by interacting. (7/8)
1
1
12
World Model Generalization: Easy→Hard transfer boosts learning on tougher variants (e.g., learning a FrozenLake 4×4 world model could also help at FrozenLake 6*6 RL training). (6/8)
1
1
5
What contributes to successful world modelling? Through systematic per-module ablations, we isolate the factors that enable self-play world modelling to succeed: (1) transition modelling should be learned—removing it wipes out PPO gains; (2) states estimation should be
1
2
6
On OOD puzzles (Sokoban, FrozenLake, Sudoku), SPA reverses the usual trend where Pass@k stagnates during RL. Example with Qwen2.5-1.5B-Instruct: Sokoban 25.6% → 59.8%, FrozenLake 22.1% → 70.9%. (4/8)
1
2
8
SPA learns a world-modelling module via self-play SFT to initialize the policy model. We flip the usual order: we first 1) learn how the world works, which will then encourage 2) the learning from rewards. We first let the agent randomly explore the environment, then output its
2
3
8
Want to get an LLM agent to succeed in an OOD environment? We tackle the hardest case with SPA (Self-Play Agent). No extra data, tools, or stronger models. Pure self-play. We first internalize a world model via Self-Play, then we learn how to win by RL. Like a child playing
4
26
227
Excited about the first spatial intelligence workshop! We have some interesting findings of building internal beliefs of space: - Why is spatial reasoning hard for VLMs, let us open up VLMs: AdaptVis https://t.co/b2Hr5yqnMV - Thinking from limited views: MindCube
📣 Announcing MUSI: 1st Multimodal Spatial Intelligence Workshop @ICCVConference! 🎙️All-star keynotes: @sainingxie, @ManlingLi_, @RanjayKrishna, @yuewang314, and @QianqianWang5 - plus a panel on the future of the field! 🗓 Oct 20, 1pm-5:30pm HST 🔗 https://t.co/wZaWKRIcYI
4
25
216
World Model Reasoning for VLM Agents (NeurIPS 2025, Score 5544) We release VAGEN to teach VLMs to build internal world models via visual state reasoning: - StateEstimation: what is the current state? - TransitionModeling: what is next? MDP → POMDP shift to handle the partial
3
67
302
Honored to be named as MIT TR 35 Under 35 @techreview. Couldn’t have done this without the best PhD advisor @hengjinlp, my academic home @siebelschool, and the most supportive postdoc mentors @jiajunwu_cs @drfeifei @StanfordAILab @StanfordHAI, my forever mentor Shih-Fu Chang,
32
11
237
🚀 Excited to share our latest work: WebExplorer! We design a simple but efficient framework for synthesizing challenging QA pairs. Through SFT and RL, we develop a strong long-horizon WebExplorer-8B model, with better performance on BrowseComp than previous 72B models!
1
5
19
Mor is an amazing advisor: clear vision, incisive feedback, and strong support. Strongly recommended! Apply to her group!
The Azrieli International Postdoc Fellowship is now open! Email me if you're interested in joining my group next year in vibrant Tel Aviv for AI interpretability research 💫 https://t.co/RJCcKx1EKr
1
0
4
All week during rebuttals, I have started each day with the same reminder: stay humble, stay kind, don't let this turn me mean. When I was doing PhD, reviewers never felt this mean. There is a bright-eyed student sitting on the other side, and such reviews will destroy
15
39
550
FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:
In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: https://t.co/CE73PDhSP4 (1/n)
2
5
12