Lily Chen @lilyychenn X Profile

Lily Chen

@lilyychenn

Followers

174

Following

3K

Media

7

Statuses

51

MIT

https://t.co/pcOyaRN5hp

Joined February 2015

Don't wanna be here? Send us removal request.

Chanakya Ekbote

@thecekbote

3 days

How do we teach LLMs not just to reason, but to reflect, debug, and improve themselves? We at AWS AI Labs introduce MURPHY 🤖, a multi-turn RL framework that brings self-correction into #RLVR (#GRPO). 🧵👇 Link: https://t.co/3kFjI5mxR5

2

14

25

Lily Chen

@lilyychenn

5 months

You can find our annotation data and interface here: https://t.co/QMdkPX2JvC. Many thanks to my co-lead @sebajoed and our amazing collaborators Barry Wei, @mackert, @ijmarshall, @pliang279, @RKouzyMD, @byron_c_wallace, and @jessyjli! 5/

github.com

Contribute to SebaJoe/decide-less-communicate-more development by creating an account on GitHub.

0

Lily Chen

@lilyychenn

5 months

To address these challenges, we propose a communication model that: - clarifies intent through dialogue - guides claims toward verifiable evidence - explains diverse expert perspectives instead of forcing consensus It reframes medical fact-checking as patient–expert dialogue 4/

1

0

1

Lily Chen

@lilyychenn

5 months

Verifying medical claims wasn’t straightforward for experts. They struggled with: 1️⃣ linking claims to evidence 2️⃣ interpreting underspecified or misguided claims 3️⃣ labeling nuanced claims—often with disagreement These challenges are inherent to end-to-end fact-checking 🚧 3/

1

0

1

Lily Chen

@lilyychenn

5 months

We study real-world medical claims from Reddit, preserving post context and verifying them with RCT abstracts. 📄 Six experts annotated 20 claims, each with 10 abstracts. Annotations span: 1️⃣ abstract relevance 2️⃣ claim-level evidence quality 3️⃣ explanations citing abstracts 2/

1

0

Lily Chen

@lilyychenn

5 months

Are we fact-checking medical claims the right way? 🩺🤔 Probably not. In our study, even experts struggled to verify Reddit health claims using end-to-end systems. We show why—and argue fact-checking should be a dialogue, with patients in the loop https://t.co/Wzbwe4i577 🧵1/

1

8

26

Adithya Bhaskar

@AdithyaNLP

5 months

There are many KV cache-reduction methods, but a fair comparison is challenging. We propose a new unified metric called “critical KV footprint”. We compare existing methods and propose a new one - PruLong, which “prunes” certain attn heads to only look at local tokens. 1/7

2

38

233

Paul Liang

@pliang279

5 months

I am very excited about David's @ddvd233 line of work in developing generalist multimodal clinical foundation models. CLIMB (which will be presented at ICML 2025) https://t.co/XPTiplS0xc is a large-scale benchmark comprising 4.51 million patient samples totaling 19.01 terabytes

github.com

Contribute to DDVD233/CLIMB development by creating an account on GitHub.

[email protected]

@ddvd233

5 months

Thanks @iScienceLuvr for posting about our recent work! We're excited to introduce QoQ-Med, a multimodal medical foundation model that jointly reasons across medical images, videos, time series (ECG), and clinical texts. Beyond the model itself, we developed a novel training

1

4

21

Sebastian Joseph

@sebajoed

6 months

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

1

8

23

Paul Liang

@pliang279

7 months

friends at #CHI2025, Karan @realkaranahuja, Yiyue @LuoYiyue, and I are teaching a course on **Multimodal AI for human sensing and interaction** come join us and learn about the latest advances in multimodal AI, generative AI, efficient software, and sensing hardware to

2

7

49

Paul Liang

@pliang279

7 months

Lots of interest in the recent o3 and o4 models, but while these more advanced multimodal AI systems start getting better at math, do they also become better intelligent tutors to help students learn math? 🚨Introducing Interactive Sketchpad, an intelligent AI tutor that

1

15

59

James Zou

@james_y_zou

7 months

Can LLMs learn to reason better by "cheating"?🤯 Excited to introduce #cheatsheet: a dynamic memory module enabling LLMs to learn + reuse insights from tackling previous problems 🎯Claude3.5 23% ➡️ 50% AIME 2024 🎯GPT4o 10% ➡️ 99% on Game of 24 Great job @suzgunmirac w/ awesome

9

39

255

Peter Tong

@TongPetersb

8 months

Vision models have been smaller than language models; what if we scale them up? Introducing Web-SSL: A family of billion-scale SSL vision models (up to 7B parameters) trained on billions of images without language supervision, using VQA to evaluate the learned representation.

8

87

480

Hengzhi Li

@HengzhiL7014

8 months

While today’s multimodal models excel at language-based social tasks, can they understand humans without words? ...not really😶 We introduce MimeQA, a video QA dataset to test AI's nonverbal social intelligence—using mime videos 🤐 Paper: https://t.co/PFIk7pacTs 🧵1/8

2

11

14

Isaac Liao

@LiaoIsaac91893

9 months

Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4

37

192

1K

Jessy Li

@jessyjli

1 year

Thrilled that we won an 🥂Outstanding Paper Award at #EMNLP2024! Super validating for using computational methods to investigate discourse processing via QUDs. Super proud of my students @YatingWu96 @ritikarmangla, amazing team @AlexGDimakis @gregd_nlp

Yating Wu

@YatingWu96

2 years

LLMs can mimic human curiosity by generating open-ended inquisitive questions given some context, similar to how humans wonder when they read. But which ones are more important to be answered?🤔 We predict the salience of questions, substantially outperforming GPT-4.🌟 🧵1/5

14

9

130

Paul Liang

@pliang279

1 year

heading to #emnlp2024! would love to chat with those interested in joining our Multisensory Intelligence research group at MIT @medialab @MITEECS https://t.co/i4y1IK6unF Our group studies the foundations of multisensory AI to create human-AI symbiosis across scales and sensory

3

15

116

Jessy Li

@jessyjli

1 year

Excited for #EMNLP2024! Check out work from my students and collaborators that will be presented: https://t.co/cpwLhVsAlf

2

9

76

Paul Liang

@pliang279

1 year

📣 Announcing the name and theme of my new research group at MIT @medialab @MITEECS: ***Multisensory Intelligence*** https://t.co/i4y1IK72dd Our group studies the foundations of multisensory AI to create human-AI symbiosis across scales and sensory mediums. We are hiring at

10

49

439

Lily Chen

@lilyychenn

1 year

I'm excited to announce that our work, 𝐅𝐚𝐜𝐭𝐏𝐈𝐂𝐎, has been accepted to 𝗔𝗖𝗟 𝟮𝟬𝟮𝟰! 🎉🇹🇭 A huge thanks to all amazing collaborators 🚀🫶 #NLProc #ACL2024NLP

Lily Chen

@lilyychenn

2 years

LLMs can write impressive-looking summaries of technical texts in plain language. But are they factual? This is critical in medicine, and the answer is tricky! Introducing ⚕️FactPICO, the first **expert** evaluation of this, with explanations Paper: https://t.co/AoSMyP0wNB 🧵1/

0

10