ShirleyYXWu Profile Banner
Shirley Wu Profile
Shirley Wu

@ShirleyYXWu

Followers
2K
Following
574
Media
24
Statuses
164

CS PhD candidate @Stanford working w/ @jure & @james_y_zou on LLM agents and alignment | Prev USTC, Intern @MSFTResearch, @NUSingapore

Palo Alto
Joined August 2021
Don't wanna be here? Send us removal request.
@ShirleyYXWu
Shirley Wu
20 days
Even the smartest LLMs can fail at basic multiturn communication. Ask for grocery help → without asking where you live 🤦‍♀️.Ask to write articles → assumes your preferences 🤷🏻‍♀️. ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators.
Tweet media one
8
45
148
@ShirleyYXWu
Shirley Wu
3 days
RT @cindy_x_wu: 📢 Our ICCV 2025 Workshop on Curated Data for Efficient Learning is accepting submissions!. To be published in the proceedin….
0
6
0
@ShirleyYXWu
Shirley Wu
5 days
RT @_anniechen_: How should an RL agent leverage expert data to improve sample efficiency?. Imitation losses can overly constrain an RL pol….
0
34
0
@ShirleyYXWu
Shirley Wu
16 days
RT @AndrewYNg: One of the most effective things the U.S. or any other nation can do to ensure its competitiveness in AI is to welcome high-….
0
407
0
@ShirleyYXWu
Shirley Wu
20 days
RT @james_y_zou: Excited to introduce #CollabLLM -- a method to train LLMs to collaborate better w/ humans! Selected as #icml2025 oral (top….
0
8
0
@ShirleyYXWu
Shirley Wu
20 days
@icmlconf Huge thanks to my coauthors & advisors: Michel Galley, @jure, @james_y_zou, @JianfengGao0217 & team!.It was an amazing summer interning @MSFTResearch, followed by substantial work conducted @StanfordAILab. Thanks for checking out CollabLLM - hope to see you in Vancouver! 🇨🇦.
0
0
2
@ShirleyYXWu
Shirley Wu
20 days
@icmlconf [9/9] We've all battled AI that assumes, stays passive, and not willing to actively understand us 😤. What if your LLM collaborated like a real teammate?. CollabLLM makes this reality. 🔗 📦 pip install collabllm. Stop struggling. Start collaborating ✨.
1
0
1
@ShirleyYXWu
Shirley Wu
20 days
@icmlconf [8/9] Generalization power: CollabLLM models trained on coding tasks automatically learn to ask clarifying questions on ambiguous QA tasks. The collaborative behaviors transfer w/o additional training 🚀. This isn't just task-specific improvement—it's fundamental skills!
Tweet media one
1
0
1
@ShirleyYXWu
Shirley Wu
20 days
@icmlconf [7/9] Ever chat with LLMs and get more annoying over time? 😤. We tracked user ratings every 3 turns,.📉 Base model: User ratings decline (users get frustrated) .📈 CollabLLM: User ratings maintain/increase!. CollabLLM gets better at collaborating the longer you chat ✨
Tweet media one
1
0
1
@ShirleyYXWu
Shirley Wu
20 days
@icmlconf [6/9] But does it work with real users?. Large-scale study with 201 participants:.✅ 17.6% higher user satisfaction.✅ 10.4% faster task completion.✅ 91.4% rated document quality above "good".
Tweet media one
1
0
1
@ShirleyYXWu
Shirley Wu
20 days
@icmlconf [5/9] We evaluate models on our multiturn interaction benchmarks: . 📝 Document editing (based on Medium) .💻 Code assistance (BigCodeBench). 🧮 Math problem solving (MATH). Results: .18.5% better task performance .13.3% more efficient conversations.46.3% higher interactivity
Tweet media one
1
0
2
@ShirleyYXWu
Shirley Wu
20 days
@icmlconf [4/9] How does CollabLLM work?.🤖<->👥 Model and User simulator create future conversations. 📊 Compute Multiturn-aware Rewards for the future conversations (e.g., task, efficiency, engagement) . 🔄Model learns to maximize long-term impact.→ Forward-looking LLM that
Tweet media one
1
0
3
@ShirleyYXWu
Shirley Wu
20 days
@icmlconf [3/9] Our breakthrough: CollabLLM maximizes Multiturn-aware Rewards!. Instead of rewarding based on only single-turn responses, we reward responses based on their long-term impact on the future conversation ✨. Result: Reward(asking a question) > Reward(comprehensive response) if.
1
0
2
@ShirleyYXWu
Shirley Wu
20 days
@icmlconf [2/9] The passive behavior problem:.LLMs don't naturally help users clarify needs when faced with ambiguity. Why? They're trained on single-turn rewards that prioritize immediate, helpful responses over long-term collaboration success. For example,
Tweet media one
1
1
3
@ShirleyYXWu
Shirley Wu
1 month
RT @Sahil1V: 🚨 New Paper! 🚨.Guard models slow, language-specific, and modality-limited?. Meet OmniGuard that detects harmful prompts across….
0
34
0
@ShirleyYXWu
Shirley Wu
1 month
RT @Diyi_Yang: 🤝 Humans + AI = Better together? . Our #ACL2025 tutorial offers an interdisciplinary overview of human-AI collaboration to e….
0
14
0
@ShirleyYXWu
Shirley Wu
1 month
RT @jure: Announcing Biomni — the first general-purpose biomedical AI agent. Biomni is a free web platform where biomedical scientists can….
0
86
0
@ShirleyYXWu
Shirley Wu
2 months
Can we ever truly trust foundation models—and if so, how?. Our ICCV TrustFM workshop ( is now accepting submissions (deadline: 8/1, attending: 10/19-10/23, Hawai'i). Submit, attend, and learn from everyone around the world who is making FMs more
Tweet media one
0
7
39
@ShirleyYXWu
Shirley Wu
2 months
RT @serinachang5: Excited to have two papers accepted to ACL 2025 main! 🎉 . 1. ChatBench with @jakehofman @ashton1anderson - we conduct a l….
0
10
0
@ShirleyYXWu
Shirley Wu
2 months
If ML taught me one thing: objective matters. Everyone is motivated to get more paper because Google scholar says "(total) citation" and "h-index". Just an idea: let's just show “average citations per publication” and see how it goes. Better yet, prioritize this metric in.
@youjiaxuan
Jiaxuan You
2 months
🤯NeurIPS 2025 might break records as the most submitted-to academic conference ever. One of our submission IDs is already ~23,000 — final count could hit 30,000. Absolute madness. #NeurIPS2025 #AI.
5
2
92