Shiqi Chen @shiqi_chen17 X Profile

Shiqi Chen

@shiqi_chen17

Followers

506

Following

228

Media

18

Statuses

75

PhD student @CityUHongKong. NLPer. Visiting PhD @OxCSML @NorthwesternU and @HKUST. Former @SeaAIL.

Hong Kong

Joined March 2023

Don't wanna be here? Send us removal request.

Shiqi Chen

@shiqi_chen17

7 months

🚀🔥 Thrilled to announce our ICML25 paper: "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas"! We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. 🌍🔍 Paper:

5

40

297

Shiqi Chen

@shiqi_chen17

17 hours

Manling is an incredibly encouraging advisor. She always lights me up when my projects feel stuck and makes me truly believe “I can do it.” She’s the one who turns roadblocks into possibilities. Apply to her group!

Manling Li

@ManlingLi_

23 hours

We are looking for PhDs and Postdocs! So proud of my students on achieving so many amazing things during their "very first year". I have been asked many times how I like being faculty, especially with funding cuts. My answer is always "it is the prefect job for me"! Still

1

38

Shiqi Chen

@shiqi_chen17

6 days

We’ve been working on this project for nine months. It started from a very simple belief: if we call it attention, it should roughly tell us where the model is “looking.” As we dug deeper into spatial reasoning, the attention patterns kept revealing subtle and surprising

Manling Li

@ManlingLi_

6 days

While discussing spatial intelligence of "VLMs", wanted to share an interesting finding we have in ICML25 paper: We actually opens the black box of why VLMs fail at even the simplest spatial question "where is A to B" - 90% of tokens are visual, yet they get only ~10% of the

0

4

15

Manling Li

@ManlingLi_

8 days

Spatial intelligence has long been one of the biggest bottleneck for VLMs. Two years ago in Sept 2023, when I just started my postdoc, I still remember vividly how we are excited about GPT-4V and how our “What GPT-4V still can’t do” slides were completely dominated by geometric

Fei-Fei Li

@drfeifei

15 days

AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on

14

126

670

Junxian He

@junxian_he

26 days

🚀We are excited to introduce the Tool Decathlon (Toolathlon), a benchmark for language agents on diverse, complex, and realistic tool use. ⭐️32 applications and 600+ tools based on real-world software environments ⭐️Execution-based, reliable evaluation ⭐️Realistic, covering

6

27

164

Shiqi Chen

@shiqi_chen17

1 month

Thanks @wzihanw! Couldn’t have shipped this without your practical walkthroughs of the RAGEN internals. Development was much smoother because of you!

Zihan Wang ✈️ NeurIPS

@wzihanw

1 month

Learning a world model is critical for LLMs to succeed in an environment they have never seen before. Check out this great work led by the amazing @shiqi_chen17!

1

0

6

Shiqi Chen

@shiqi_chen17

1 month

Great collaboration with my amazing advisors @ManlingLi_ @junxian_he @yeewhye Siyang Gao, and my supportive collaborators @tongyao_zhu @Aydenwza @jinghan23 @TengX6 @James_KKW! Code: https://t.co/yxnXxPuxdL Paper: https://t.co/O7okHhIZKB (8/8)

github.com

Github repository for "Internalizing World Models via Self-Play Finetuning for Agentic RL" - shiqichen17/SPA

1

3

16

Shiqi Chen

@shiqi_chen17

1 month

Takeaway: No stronger models. No ground truth actions. No tools. A single LLM agent can enter an OOD environment, internalize its dynamics, and learn to win—just by interacting. (7/8)

1

12

Shiqi Chen

@shiqi_chen17

1 month

World Model Generalization: Easy→Hard transfer boosts learning on tougher variants (e.g., learning a FrozenLake 4×4 world model could also help at FrozenLake 6*6 RL training). (6/8)

1

5

Shiqi Chen

@shiqi_chen17

1 month

What contributes to successful world modelling? Through systematic per-module ablations, we isolate the factors that enable self-play world modelling to succeed: (1) transition modelling should be learned—removing it wipes out PPO gains; (2) states estimation should be

1

2

6

Shiqi Chen

@shiqi_chen17

1 month

On OOD puzzles (Sokoban, FrozenLake, Sudoku), SPA reverses the usual trend where Pass@k stagnates during RL. Example with Qwen2.5-1.5B-Instruct: Sokoban 25.6% → 59.8%, FrozenLake 22.1% → 70.9%. (4/8)

1

2

8

Shiqi Chen

@shiqi_chen17

1 month

SPA learns a world-modelling module via self-play SFT to initialize the policy model. We flip the usual order: we first 1) learn how the world works, which will then encourage 2) the learning from rewards. We first let the agent randomly explore the environment, then output its

2

3

8

Shiqi Chen

@shiqi_chen17

1 month

Why do LLM agents often fail in text games? Prior work tracks Pass@1 as success rate, concluding there is a ceiling but do not know why. We found that when Pass@1 saturates, Pass@k actually declines. Pass@k is actually a metric showing the exploration ability for environments,

1

2

8

Shiqi Chen

@shiqi_chen17

1 month

Want to get an LLM agent to succeed in an OOD environment? We tackle the hardest case with SPA (Self-Play Agent). No extra data, tools, or stronger models. Pure self-play. We first internalize a world model via Self-Play, then we learn how to win by RL. Like a child playing

4

26

227

Manling Li

@ManlingLi_

1 month

Excited about the first spatial intelligence workshop! We have some interesting findings of building internal beliefs of space: - Why is spatial reasoning hard for VLMs, let us open up VLMs: AdaptVis https://t.co/b2Hr5yqnMV - Thinking from limited views: MindCube

Songyou Peng

@songyoupeng

2 months

📣 Announcing MUSI: 1st Multimodal Spatial Intelligence Workshop @ICCVConference! 🎙️All-star keynotes: @sainingxie, @ManlingLi_, @RanjayKrishna, @yuewang314, and @QianqianWang5 - plus a panel on the future of the field! 🗓 Oct 20, 1pm-5:30pm HST 🔗 https://t.co/wZaWKRIcYI

4

25

216

Manling Li

@ManlingLi_

1 month

World Model Reasoning for VLM Agents (NeurIPS 2025, Score 5544) We release VAGEN to teach VLMs to build internal world models via visual state reasoning: - StateEstimation: what is the current state? - TransitionModeling: what is next? MDP → POMDP shift to handle the partial

3

67

302

Manling Li

@ManlingLi_

2 months

Honored to be named as MIT TR 35 Under 35 @techreview. Couldn’t have done this without the best PhD advisor @hengjinlp, my academic home @siebelschool, and the most supportive postdoc mentors @jiajunwu_cs @drfeifei @StanfordAILab @StanfordHAI, my forever mentor Shih-Fu Chang,

MIT Technology Review

@techreview

3 months

Today in The Download, our daily newsletter: Introducing our 35 Innovators Under 35 list for 2025

32

11

237

Junteng Liu

@junteng88716710

3 months

🚀 Excited to share our latest work: WebExplorer! We design a simple but efficient framework for synthesizing challenging QA pairs. Through SFT and RL, we develop a strong long-horizon WebExplorer-8B model, with better performance on BrowseComp than previous 72B models!

AK

@_akhaliq

3 months

WebExplorer Explore and Evolve for Training Long-Horizon Web Agents

1

5

19

Shiqi Chen

@shiqi_chen17

3 months

Mor is an amazing advisor: clear vision, incisive feedback, and strong support. Strongly recommended! Apply to her group!

Mor Geva

@megamor2

3 months

The Azrieli International Postdoc Fellowship is now open! Email me if you're interested in joining my group next year in vibrant Tel Aviv for AI interpretability research 💫 https://t.co/RJCcKx1EKr

1

0

4

Manling Li

@ManlingLi_

4 months

All week during rebuttals, I have started each day with the same reminder: stay humble, stay kind, don't let this turn me mean. When I was doing PhD, reviewers never felt this mean. There is a bright-eyed student sitting on the other side, and such reviews will destroy

15

39

550

Ethan Chern

@ethanchern

5 months

FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:

Ethan Chern

@ethanchern

2 years

In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: https://t.co/CE73PDhSP4 (1/n)

2

5

12