Shirley Wu @ShirleyYXWu X Profile

Shirley Wu

@ShirleyYXWu

Followers

2K

Following

574

Media

24

Statuses

164

CS PhD candidate @Stanford working w/ @jure & @james_y_zou on LLM agents and alignment | Prev USTC, Intern @MSFTResearch, @NUSingapore

Palo Alto

Joined August 2021

Don't wanna be here? Send us removal request.

Shirley Wu

@ShirleyYXWu

20 days

Even the smartest LLMs can fail at basic multiturn communication. Ask for grocery help → without asking where you live 🤦‍♀️.Ask to write articles → assumes your preferences 🤷🏻‍♀️. ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators.

8

45

148

Shirley Wu

@ShirleyYXWu

3 days

RT @cindy_x_wu: 📢 Our ICCV 2025 Workshop on Curated Data for Efficient Learning is accepting submissions!. To be published in the proceedin….

0

6

0

Shirley Wu

@ShirleyYXWu

5 days

RT @_anniechen_: How should an RL agent leverage expert data to improve sample efficiency?. Imitation losses can overly constrain an RL pol….

0

34

0

Shirley Wu

@ShirleyYXWu

16 days

RT @AndrewYNg: One of the most effective things the U.S. or any other nation can do to ensure its competitiveness in AI is to welcome high-….

0

407

0

Shirley Wu

@ShirleyYXWu

20 days

RT @james_y_zou: Excited to introduce #CollabLLM -- a method to train LLMs to collaborate better w/ humans! Selected as #icml2025 oral (top….

0

8

0

Shirley Wu

@ShirleyYXWu

20 days

@icmlconf Huge thanks to my coauthors & advisors: Michel Galley, @jure, @james_y_zou, @JianfengGao0217 & team!.It was an amazing summer interning @MSFTResearch, followed by substantial work conducted @StanfordAILab. Thanks for checking out CollabLLM - hope to see you in Vancouver! 🇨🇦.

0

2

Shirley Wu

@ShirleyYXWu

20 days

@icmlconf [9/9] We've all battled AI that assumes, stays passive, and not willing to actively understand us 😤. What if your LLM collaborated like a real teammate?. CollabLLM makes this reality. 🔗 📦 pip install collabllm. Stop struggling. Start collaborating ✨.

1

0

1

Shirley Wu

@ShirleyYXWu

20 days

@icmlconf [8/9] Generalization power: CollabLLM models trained on coding tasks automatically learn to ask clarifying questions on ambiguous QA tasks. The collaborative behaviors transfer w/o additional training 🚀. This isn't just task-specific improvement—it's fundamental skills!

1

0

1

Shirley Wu

@ShirleyYXWu

20 days

@icmlconf [7/9] Ever chat with LLMs and get more annoying over time? 😤. We tracked user ratings every 3 turns,.📉 Base model: User ratings decline (users get frustrated) .📈 CollabLLM: User ratings maintain/increase!. CollabLLM gets better at collaborating the longer you chat ✨

1

0

1

Shirley Wu

@ShirleyYXWu

20 days

@icmlconf [6/9] But does it work with real users?. Large-scale study with 201 participants:.✅ 17.6% higher user satisfaction.✅ 10.4% faster task completion.✅ 91.4% rated document quality above "good".

1

0

1

Shirley Wu

@ShirleyYXWu

20 days

@icmlconf [5/9] We evaluate models on our multiturn interaction benchmarks: . 📝 Document editing (based on Medium) .💻 Code assistance (BigCodeBench). 🧮 Math problem solving (MATH). Results: .18.5% better task performance .13.3% more efficient conversations.46.3% higher interactivity

1

0

2

Shirley Wu

@ShirleyYXWu

20 days

@icmlconf [4/9] How does CollabLLM work?.🤖<->👥 Model and User simulator create future conversations. 📊 Compute Multiturn-aware Rewards for the future conversations (e.g., task, efficiency, engagement) . 🔄Model learns to maximize long-term impact.→ Forward-looking LLM that

1

0

3

Shirley Wu

@ShirleyYXWu

20 days

@icmlconf [3/9] Our breakthrough: CollabLLM maximizes Multiturn-aware Rewards!. Instead of rewarding based on only single-turn responses, we reward responses based on their long-term impact on the future conversation ✨. Result: Reward(asking a question) > Reward(comprehensive response) if.

1

0

2

Shirley Wu

@ShirleyYXWu

20 days

@icmlconf [2/9] The passive behavior problem:.LLMs don't naturally help users clarify needs when faced with ambiguity. Why? They're trained on single-turn rewards that prioritize immediate, helpful responses over long-term collaboration success. For example,

1

3

Shirley Wu

@ShirleyYXWu

1 month

RT @Sahil1V: 🚨 New Paper! 🚨.Guard models slow, language-specific, and modality-limited?. Meet OmniGuard that detects harmful prompts across….

0

34

0

Shirley Wu

@ShirleyYXWu

1 month

RT @Diyi_Yang: 🤝 Humans + AI = Better together? . Our #ACL2025 tutorial offers an interdisciplinary overview of human-AI collaboration to e….

0

14

0

Shirley Wu

@ShirleyYXWu

1 month

RT @jure: Announcing Biomni — the first general-purpose biomedical AI agent. Biomni is a free web platform where biomedical scientists can….

0

86

0

Shirley Wu

@ShirleyYXWu

2 months

Can we ever truly trust foundation models—and if so, how?. Our ICCV TrustFM workshop ( is now accepting submissions (deadline: 8/1, attending: 10/19-10/23, Hawai'i). Submit, attend, and learn from everyone around the world who is making FMs more

0

7

39

Shirley Wu

@ShirleyYXWu

2 months

RT @serinachang5: Excited to have two papers accepted to ACL 2025 main! 🎉 . 1. ChatBench with @jakehofman @ashton1anderson - we conduct a l….

0

10

0

Shirley Wu

@ShirleyYXWu

2 months

If ML taught me one thing: objective matters. Everyone is motivated to get more paper because Google scholar says "(total) citation" and "h-index". Just an idea: let's just show “average citations per publication” and see how it goes. Better yet, prioritize this metric in.

Jiaxuan You

@youjiaxuan

2 months

🤯NeurIPS 2025 might break records as the most submitted-to academic conference ever. One of our submission IDs is already ~23,000 — final count could hit 30,000. Absolute madness. #NeurIPS2025 #AI.

5

2

92