Shiqi Chen Profile
Shiqi Chen

@shiqi_chen17

Followers
506
Following
228
Media
18
Statuses
75

PhD student @CityUHongKong. NLPer. Visiting PhD @OxCSML @NorthwesternU and @HKUST. Former @SeaAIL.

Hong Kong
Joined March 2023
Don't wanna be here? Send us removal request.
@shiqi_chen17
Shiqi Chen
7 months
🚀🔥 Thrilled to announce our ICML25 paper: "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas"! We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. 🌍🔍 Paper:
5
40
297
@shiqi_chen17
Shiqi Chen
17 hours
Manling is an incredibly encouraging advisor. She always lights me up when my projects feel stuck and makes me truly believe “I can do it.” She’s the one who turns roadblocks into possibilities. Apply to her group!
@ManlingLi_
Manling Li
23 hours
We are looking for PhDs and Postdocs! So proud of my students on achieving so many amazing things during their "very first year". I have been asked many times how I like being faculty, especially with funding cuts. My answer is always "it is the prefect job for me"! Still
1
1
38
@shiqi_chen17
Shiqi Chen
6 days
We’ve been working on this project for nine months. It started from a very simple belief: if we call it attention, it should roughly tell us where the model is “looking.” As we dug deeper into spatial reasoning, the attention patterns kept revealing subtle and surprising
@ManlingLi_
Manling Li
6 days
While discussing spatial intelligence of "VLMs", wanted to share an interesting finding we have in ICML25 paper: We actually opens the black box of why VLMs fail at even the simplest spatial question "where is A to B" - 90% of tokens are visual, yet they get only ~10% of the
0
4
15
@ManlingLi_
Manling Li
8 days
Spatial intelligence has long been one of the biggest bottleneck for VLMs. Two years ago in Sept 2023, when I just started my postdoc, I still remember vividly how we are excited about GPT-4V and how our “What GPT-4V still can’t do” slides were completely dominated by geometric
@drfeifei
Fei-Fei Li
15 days
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on
14
126
670
@junxian_he
Junxian He
26 days
🚀We are excited to introduce the Tool Decathlon (Toolathlon), a benchmark for language agents on diverse, complex, and realistic tool use. ⭐️32 applications and 600+ tools based on real-world software environments ⭐️Execution-based, reliable evaluation ⭐️Realistic, covering
6
27
164
@shiqi_chen17
Shiqi Chen
1 month
Thanks @wzihanw! Couldn’t have shipped this without your practical walkthroughs of the RAGEN internals. Development was much smoother because of you!
@wzihanw
Zihan Wang ✈️ NeurIPS
1 month
Learning a world model is critical for LLMs to succeed in an environment they have never seen before. Check out this great work led by the amazing @shiqi_chen17!
1
0
6
@shiqi_chen17
Shiqi Chen
1 month
Takeaway: No stronger models. No ground truth actions. No tools. A single LLM agent can enter an OOD environment, internalize its dynamics, and learn to win—just by interacting. (7/8)
1
1
12
@shiqi_chen17
Shiqi Chen
1 month
World Model Generalization: Easy→Hard transfer boosts learning on tougher variants (e.g., learning a FrozenLake 4×4 world model could also help at FrozenLake 6*6 RL training). (6/8)
1
1
5
@shiqi_chen17
Shiqi Chen
1 month
What contributes to successful world modelling? Through systematic per-module ablations, we isolate the factors that enable self-play world modelling to succeed: (1) transition modelling should be learned—removing it wipes out PPO gains; (2) states estimation should be
1
2
6
@shiqi_chen17
Shiqi Chen
1 month
On OOD puzzles (Sokoban, FrozenLake, Sudoku), SPA reverses the usual trend where Pass@k stagnates during RL. Example with Qwen2.5-1.5B-Instruct: Sokoban 25.6% → 59.8%, FrozenLake 22.1% → 70.9%. (4/8)
1
2
8
@shiqi_chen17
Shiqi Chen
1 month
SPA learns a world-modelling module via self-play SFT to initialize the policy model. We flip the usual order: we first 1) learn how the world works, which will then encourage 2) the learning from rewards. We first let the agent randomly explore the environment, then output its
2
3
8
@shiqi_chen17
Shiqi Chen
1 month
Why do LLM agents often fail in text games? Prior work tracks Pass@1 as success rate, concluding there is a ceiling but do not know why. We found that when Pass@1 saturates, Pass@k actually declines. Pass@k is actually a metric showing the exploration ability for environments,
1
2
8
@shiqi_chen17
Shiqi Chen
1 month
Want to get an LLM agent to succeed in an OOD environment? We tackle the hardest case with SPA (Self-Play Agent). No extra data, tools, or stronger models. Pure self-play. We first internalize a world model via Self-Play, then we learn how to win by RL. Like a child playing
4
26
227
@ManlingLi_
Manling Li
1 month
Excited about the first spatial intelligence workshop! We have some interesting findings of building internal beliefs of space: - Why is spatial reasoning hard for VLMs, let us open up VLMs: AdaptVis https://t.co/b2Hr5yqnMV - Thinking from limited views: MindCube
@songyoupeng
Songyou Peng
2 months
📣 Announcing MUSI: 1st Multimodal Spatial Intelligence Workshop @ICCVConference! 🎙️All-star keynotes: @sainingxie, @ManlingLi_, @RanjayKrishna, @yuewang314, and @QianqianWang5 - plus a panel on the future of the field! 🗓 Oct 20, 1pm-5:30pm HST 🔗 https://t.co/wZaWKRIcYI
4
25
216
@ManlingLi_
Manling Li
1 month
World Model Reasoning for VLM Agents (NeurIPS 2025, Score 5544) We release VAGEN to teach VLMs to build internal world models via visual state reasoning: - StateEstimation: what is the current state? - TransitionModeling: what is next? MDP → POMDP shift to handle the partial
3
67
302
@ManlingLi_
Manling Li
2 months
Honored to be named as MIT TR 35 Under 35 @techreview. Couldn’t have done this without the best PhD advisor @hengjinlp, my academic home @siebelschool, and the most supportive postdoc mentors @jiajunwu_cs @drfeifei @StanfordAILab @StanfordHAI, my forever mentor Shih-Fu Chang,
@techreview
MIT Technology Review
3 months
Today in The Download, our daily newsletter: Introducing our 35 Innovators Under 35 list for 2025
32
11
237
@junteng88716710
Junteng Liu
3 months
🚀 Excited to share our latest work: WebExplorer! We design a simple but efficient framework for synthesizing challenging QA pairs. Through SFT and RL, we develop a strong long-horizon WebExplorer-8B model, with better performance on BrowseComp than previous 72B models!
@_akhaliq
AK
3 months
WebExplorer Explore and Evolve for Training Long-Horizon Web Agents
1
5
19
@shiqi_chen17
Shiqi Chen
3 months
Mor is an amazing advisor: clear vision, incisive feedback, and strong support. Strongly recommended! Apply to her group!
@megamor2
Mor Geva
3 months
The Azrieli International Postdoc Fellowship is now open! Email me if you're interested in joining my group next year in vibrant Tel Aviv for AI interpretability research 💫 https://t.co/RJCcKx1EKr
1
0
4
@ManlingLi_
Manling Li
4 months
All week during rebuttals, I have started each day with the same reminder: stay humble, stay kind, don't let this turn me mean. When I was doing PhD, reviewers never felt this mean. There is a bright-eyed student sitting on the other side, and such reviews will destroy
15
39
550
@ethanchern
Ethan Chern
5 months
FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:
@ethanchern
Ethan Chern
2 years
In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: https://t.co/CE73PDhSP4 (1/n)
2
5
12