Lily Chen Profile
Lily Chen

@lilyychenn

Followers
174
Following
3K
Media
7
Statuses
51

MIT

Joined February 2015
Don't wanna be here? Send us removal request.
@thecekbote
Chanakya Ekbote
3 days
How do we teach LLMs not just to reason, but to reflect, debug, and improve themselves? We at AWS AI Labs introduce MURPHY 🤖, a multi-turn RL framework that brings self-correction into #RLVR (#GRPO). 🧵👇 Link: https://t.co/3kFjI5mxR5
2
14
25
@lilyychenn
Lily Chen
5 months
You can find our annotation data and interface here: https://t.co/QMdkPX2JvC. Many thanks to my co-lead @sebajoed and our amazing collaborators Barry Wei, @mackert, @ijmarshall, @pliang279, @RKouzyMD, @byron_c_wallace, and @jessyjli! 5/
Tweet card summary image
github.com
Contribute to SebaJoe/decide-less-communicate-more development by creating an account on GitHub.
0
0
0
@lilyychenn
Lily Chen
5 months
To address these challenges, we propose a communication model that: - clarifies intent through dialogue - guides claims toward verifiable evidence - explains diverse expert perspectives instead of forcing consensus It reframes medical fact-checking as patient–expert dialogue 4/
1
0
1
@lilyychenn
Lily Chen
5 months
Verifying medical claims wasn’t straightforward for experts. They struggled with: 1️⃣ linking claims to evidence 2️⃣ interpreting underspecified or misguided claims 3️⃣ labeling nuanced claims—often with disagreement These challenges are inherent to end-to-end fact-checking 🚧 3/
1
0
1
@lilyychenn
Lily Chen
5 months
We study real-world medical claims from Reddit, preserving post context and verifying them with RCT abstracts. 📄 Six experts annotated 20 claims, each with 10 abstracts. Annotations span: 1️⃣ abstract relevance 2️⃣ claim-level evidence quality 3️⃣ explanations citing abstracts 2/
1
0
0
@lilyychenn
Lily Chen
5 months
Are we fact-checking medical claims the right way? 🩺🤔 Probably not. In our study, even experts struggled to verify Reddit health claims using end-to-end systems. We show why—and argue fact-checking should be a dialogue, with patients in the loop https://t.co/Wzbwe4i577 🧵1/
1
8
26
@AdithyaNLP
Adithya Bhaskar
5 months
There are many KV cache-reduction methods, but a fair comparison is challenging. We propose a new unified metric called “critical KV footprint”. We compare existing methods and propose a new one - PruLong, which “prunes” certain attn heads to only look at local tokens. 1/7
2
38
233
@pliang279
Paul Liang
5 months
I am very excited about David's @ddvd233 line of work in developing generalist multimodal clinical foundation models. CLIMB (which will be presented at ICML 2025) https://t.co/XPTiplS0xc is a large-scale benchmark comprising 4.51 million patient samples totaling 19.01 terabytes
github.com
Contribute to DDVD233/CLIMB development by creating an account on GitHub.
Thanks @iScienceLuvr for posting about our recent work! We're excited to introduce QoQ-Med, a multimodal medical foundation model that jointly reasons across medical images, videos, time series (ECG), and clinical texts. Beyond the model itself, we developed a novel training
1
4
21
@sebajoed
Sebastian Joseph
6 months
How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
1
8
23
@pliang279
Paul Liang
7 months
friends at #CHI2025, Karan @realkaranahuja, Yiyue @LuoYiyue, and I are teaching a course on **Multimodal AI for human sensing and interaction** come join us and learn about the latest advances in multimodal AI, generative AI, efficient software, and sensing hardware to
2
7
49
@pliang279
Paul Liang
7 months
Lots of interest in the recent o3 and o4 models, but while these more advanced multimodal AI systems start getting better at math, do they also become better intelligent tutors to help students learn math? 🚨Introducing Interactive Sketchpad, an intelligent AI tutor that
1
15
59
@james_y_zou
James Zou
7 months
Can LLMs learn to reason better by "cheating"?🤯 Excited to introduce #cheatsheet: a dynamic memory module enabling LLMs to learn + reuse insights from tackling previous problems 🎯Claude3.5 23% ➡️ 50% AIME 2024 🎯GPT4o 10% ➡️ 99% on Game of 24 Great job @suzgunmirac w/ awesome
9
39
255
@TongPetersb
Peter Tong
8 months
Vision models have been smaller than language models; what if we scale them up? Introducing Web-SSL: A family of billion-scale SSL vision models (up to 7B parameters) trained on billions of images without language supervision, using VQA to evaluate the learned representation.
8
87
480
@HengzhiL7014
Hengzhi Li
8 months
While today’s multimodal models excel at language-based social tasks, can they understand humans without words? ...not really😶 We introduce MimeQA, a video QA dataset to test AI's nonverbal social intelligence—using mime videos 🤐 Paper: https://t.co/PFIk7pacTs 🧵1/8
2
11
14
@LiaoIsaac91893
Isaac Liao
9 months
Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4
37
192
1K
@jessyjli
Jessy Li
1 year
Thrilled that we won an 🥂Outstanding Paper Award at #EMNLP2024! Super validating for using computational methods to investigate discourse processing via QUDs. Super proud of my students @YatingWu96 @ritikarmangla, amazing team @AlexGDimakis @gregd_nlp
@YatingWu96
Yating Wu
2 years
LLMs can mimic human curiosity by generating open-ended inquisitive questions given some context, similar to how humans wonder when they read. But which ones are more important to be answered?🤔 We predict the salience of questions, substantially outperforming GPT-4.🌟 🧵1/5
14
9
130
@pliang279
Paul Liang
1 year
heading to #emnlp2024! would love to chat with those interested in joining our Multisensory Intelligence research group at MIT @medialab @MITEECS https://t.co/i4y1IK6unF Our group studies the foundations of multisensory AI to create human-AI symbiosis across scales and sensory
3
15
116
@jessyjli
Jessy Li
1 year
Excited for #EMNLP2024! Check out work from my students and collaborators that will be presented: https://t.co/cpwLhVsAlf
2
9
76
@pliang279
Paul Liang
1 year
📣 Announcing the name and theme of my new research group at MIT @medialab @MITEECS: ***Multisensory Intelligence*** https://t.co/i4y1IK72dd Our group studies the foundations of multisensory AI to create human-AI symbiosis across scales and sensory mediums. We are hiring at
10
49
439
@lilyychenn
Lily Chen
1 year
I'm excited to announce that our work, 𝐅𝐚𝐜𝐭𝐏𝐈𝐂𝐎, has been accepted to 𝗔𝗖𝗟 𝟮𝟬𝟮𝟰! 🎉🇹🇭 A huge thanks to all amazing collaborators 🚀🫶 #NLProc #ACL2024NLP
@lilyychenn
Lily Chen
2 years
LLMs can write impressive-looking summaries of technical texts in plain language. But are they factual? This is critical in medicine, and the answer is tricky! Introducing ⚕️FactPICO, the first **expert** evaluation of this, with explanations Paper: https://t.co/AoSMyP0wNB 🧵1/
0
0
10