Adithya Bhaskar Profile
Adithya Bhaskar

@AdithyaNLP

Followers
439
Following
191
Media
30
Statuses
68

Third year CS PhD candidate at Princeton University (@princeton_nlp @PrincetonPLI), previously CS undergrad at IIT Bombay

Princeton, NJ
Joined June 2023
Don't wanna be here? Send us removal request.
@AdithyaNLP
Adithya Bhaskar
2 months
Language models that think, chat better. We used longCoT (w/ reward model) for RLHF instead of math, and it just works. Llama-3.1-8B-Instruct + 14K ex beats GPT-4o (!) on chat & creative writing, & even Claude-3.7-Sonnet (thinking) on AlpacaEval2 and WildBench! Read on. 🧵 1/8
3
17
110
@YangWilliam_
William Yang
14 days
Text-to-image (T2I) models can generate rich supervision for visual learning but generating subtle distinctions still remains challenging. Fine-tuning helps, but too much tuning → overfitting and loss of diversity. How do we preserve fidelity without sacrificing diversity (1/8)
2
13
38
@yinghui_he_
Yinghui He
25 days
Claude Skills shows performance benefits from leveraging LLM skill catalogs at inference time. Our previous work (linked under thread 5/5) showed the same 6 months ago! 🌟Our new work, STAT, shows that leveraging skills during training can greatly help too‼️, e.g., Qwen can
8
43
199
@xiye_nlp
Xi Ye
2 months
Check out our new work on making reasoning models think broadly! 🤔 We find a minimalist, surprisingly effective recipe to THINK for CHAT: RLVR + a strong reward model, trained on real-world prompts. This project was fun and surprised me in a few ways 👇 📌 We can run RL
@AdithyaNLP
Adithya Bhaskar
2 months
Language models that think, chat better. We used longCoT (w/ reward model) for RLHF instead of math, and it just works. Llama-3.1-8B-Instruct + 14K ex beats GPT-4o (!) on chat & creative writing, & even Claude-3.7-Sonnet (thinking) on AlpacaEval2 and WildBench! Read on. 🧵 1/8
0
22
99
@AdithyaNLP
Adithya Bhaskar
2 months
Thanks for tweeting our paper!! 😁
@rohanpaul_ai
Rohan Paul
2 months
The paper shows that making models think before answering makes them chat better. It introduces reinforcement learning with model rewarded thinking, RLMT, which makes the model write a private plan, then the final reply. A separate reward model, trained from human choices,
0
0
2
@AdithyaNLP
Adithya Bhaskar
2 months
Honored to be included in the list, thanks a lot!
@dair_ai
DAIR.AI
2 months
7. Language Models that Think, Chat Better A simple recipe, RL with Model-rewarded Thinking, makes small open models “plan first, answer second” on regular chat prompts and trains them with online RL against a preference reward. https://t.co/P6HqnTEOUo
0
0
6
@dair_ai
DAIR.AI
2 months
Top AI Papers of The Week (September 22-28): - ATOKEN - LLM-JEPA - Code World Model - Teaching LLMs to Plan - Agents Research Environments - Language Models that Think, Chat Better - Embodied AI: From LLMs to World Models Read on for more:
9
65
294
@AdithyaNLP
Adithya Bhaskar
2 months
Thanks for your kind words!
@MKulria
Manish Kulariya
2 months
Ever wonder why some AI chats feel robotic while others nail it? This new paper introduces a game-changer: Language Models that Think, Chat Better. They train AIs to "think" step-by-step before replying, crushing benchmarks. Mind blown? Let's dive in 👇
0
0
5
@AdithyaNLP
Adithya Bhaskar
2 months
Thanks a lot for the shout-out! 😁
@omarsar0
elvis
2 months
Language Models that Think and Chat Better Proposes a simple RL recipe to improve small open models (eg, 8B) that rivals GPT-4o and Claude 3.7 Sonnet (thinking). Pay attention to this one, AI devs! Here are my notes:
0
1
13
@AdithyaNLP
Adithya Bhaskar
2 months
Thanks a lot for the tweet! We had a lot of fun working on this project! 😄
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
2 months
Language Models that Think, Chat Better "This paper shows that the RLVR paradigm is effective beyond verifiable domains, and introduces RL with Model-rewarded Thinking (RLMT) for general-purpose chat capabilities." "RLMT consistently outperforms standard RLHF pipelines. This
0
0
11
@AdithyaNLP
Adithya Bhaskar
2 months
We release our 1) paper at https://t.co/PgMeacp3qb 2) code at https://t.co/cMLkDzzrVi 3) models, SFT data, and RL prompts at https://t.co/MEtceOmssY Thanks to my co-author @xiye_nlp and advisor @danqi_chen! 8/8
Tweet card summary image
huggingface.co
0
0
7
@AdithyaNLP
Adithya Bhaskar
2 months
Oh, and here is the customary plot that shows that the model learns to think longer as the training progresses. We think it's cool. 7/8
1
0
4
@AdithyaNLP
Adithya Bhaskar
2 months
What kind of plans is the LM making? We checked, and it is refreshingly non-slop. It tries to cross-link different parts of the answer, carefully navigates edge cases, doesn’t just throw everything into a billion nested lists, and even refines and iterates on its draft/plan! 6/8
1
0
4
@AdithyaNLP
Adithya Bhaskar
2 months
Okay, so what matters? We found: (1) the prompt mixture matters, (2) the source of SFT responses matters less, and (3) the strength of the reward model matters a lot. 5/8
1
0
3
@AdithyaNLP
Adithya Bhaskar
2 months
The gains in chat/creative writing are huge. Our warm-start instruct models beat GPT-4o on chat/creative writing, and even Claude-3.7-Sonnet (thinking) on AE2/WB! The zero models beat instruct versions on chat/CW (and Qwen-zero even beats Instruct on other benchmarks). 4/8
1
0
3
@AdithyaNLP
Adithya Bhaskar
2 months
How can you make LMs think? You can SFT (warm-start), or you can prompt it like "A conversation between… the assistant first thinks…" ("zero"). TLDR: warm-start works with DPO/PPO/GRPO, but zero needs GRPO. In all cases, thinking outperforms nonthinking by >= 1-3 pts avg. 3/8
1
0
3
@AdithyaNLP
Adithya Bhaskar
2 months
Everyone’s training “thinking” math/science LMs, but we think for other stuff too: we outline essays, scribble shopping lists, and rehearse speeches. Doesn’t make sense that LMs can’t, so we made them. Simple recipe: prompt → thought→ response, remove thought, score w RM. 2/8
1
0
5
@ChengleiSi
CLS
5 months
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.
12
194
635
@AdithyaNLP
Adithya Bhaskar
5 months
Paper also has (1) ablation & sensitivity studies (2) PruLong for pretraining (3) more idealized & real (hardware) metrics! Paper: https://t.co/D3bWZWshyn Code: https://t.co/EBM2dhzIcZ Special thanks to my coauthors @_awettig @YiheS5 @gaotianyu1350 @danqi_chen! 7/7
Tweet card summary image
github.com
Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?" - princeton-pli/PruLong
0
0
4
@AdithyaNLP
Adithya Bhaskar
5 months
Our modifications substantially reduce the critical KV footprint needed to retain 90% performance of the two methods by up to 30 absolute percentage points, when evaluated on long -> short (HELMET) as well as long -> long benchmarks (LongProc). 6/7
1
0
3