Adithya Bhaskar @AdithyaNLP X Profile

Adithya Bhaskar

@AdithyaNLP

Followers

439

Following

191

Media

30

Statuses

68

Third year CS PhD candidate at Princeton University (@princeton_nlp @PrincetonPLI), previously CS undergrad at IIT Bombay

https://t.co/cCKZtYEiTL

Princeton, NJ

Joined June 2023

Don't wanna be here? Send us removal request.

Adithya Bhaskar

@AdithyaNLP

2 months

Language models that think, chat better. We used longCoT (w/ reward model) for RLHF instead of math, and it just works. Llama-3.1-8B-Instruct + 14K ex beats GPT-4o (!) on chat & creative writing, & even Claude-3.7-Sonnet (thinking) on AlpacaEval2 and WildBench! Read on. 🧵 1/8

3

17

110

William Yang

@YangWilliam_

14 days

Text-to-image (T2I) models can generate rich supervision for visual learning but generating subtle distinctions still remains challenging. Fine-tuning helps, but too much tuning → overfitting and loss of diversity. How do we preserve fidelity without sacrificing diversity (1/8)

2

13

38

Yinghui He

@yinghui_he_

25 days

Claude Skills shows performance benefits from leveraging LLM skill catalogs at inference time. Our previous work (linked under thread 5/5) showed the same 6 months ago! 🌟Our new work, STAT, shows that leveraging skills during training can greatly help too‼️, e.g., Qwen can

8

43

199

Xi Ye

@xiye_nlp

2 months

Check out our new work on making reasoning models think broadly! 🤔 We find a minimalist, surprisingly effective recipe to THINK for CHAT: RLVR + a strong reward model, trained on real-world prompts. This project was fun and surprised me in a few ways 👇 📌 We can run RL

Adithya Bhaskar

@AdithyaNLP

2 months

Language models that think, chat better. We used longCoT (w/ reward model) for RLHF instead of math, and it just works. Llama-3.1-8B-Instruct + 14K ex beats GPT-4o (!) on chat & creative writing, & even Claude-3.7-Sonnet (thinking) on AlpacaEval2 and WildBench! Read on. 🧵 1/8

0

22

99

Adithya Bhaskar

@AdithyaNLP

2 months

Thanks for tweeting our paper!! 😁

Rohan Paul

@rohanpaul_ai

2 months

The paper shows that making models think before answering makes them chat better. It introduces reinforcement learning with model rewarded thinking, RLMT, which makes the model write a private plan, then the final reply. A separate reward model, trained from human choices,

0

2

Adithya Bhaskar

@AdithyaNLP

2 months

Honored to be included in the list, thanks a lot!

DAIR.AI

@dair_ai

2 months

7. Language Models that Think, Chat Better A simple recipe, RL with Model-rewarded Thinking, makes small open models “plan first, answer second” on regular chat prompts and trains them with online RL against a preference reward. https://t.co/P6HqnTEOUo

0

6

DAIR.AI

@dair_ai

2 months

Top AI Papers of The Week (September 22-28): - ATOKEN - LLM-JEPA - Code World Model - Teaching LLMs to Plan - Agents Research Environments - Language Models that Think, Chat Better - Embodied AI: From LLMs to World Models Read on for more:

9

65

294

Adithya Bhaskar

@AdithyaNLP

2 months

Thanks for your kind words!

Manish Kulariya

@MKulria

2 months

Ever wonder why some AI chats feel robotic while others nail it? This new paper introduces a game-changer: Language Models that Think, Chat Better. They train AIs to "think" step-by-step before replying, crushing benchmarks. Mind blown? Let's dive in 👇

0

5

Adithya Bhaskar

@AdithyaNLP

2 months

Thanks a lot for the shout-out! 😁

elvis

@omarsar0

2 months

Language Models that Think and Chat Better Proposes a simple RL recipe to improve small open models (eg, 8B) that rivals GPT-4o and Claude 3.7 Sonnet (thinking). Pay attention to this one, AI devs! Here are my notes:

0

1

13

Adithya Bhaskar

@AdithyaNLP

2 months

Thanks a lot for the tweet! We had a lot of fun working on this project! 😄

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

2 months

Language Models that Think, Chat Better "This paper shows that the RLVR paradigm is effective beyond verifiable domains, and introduces RL with Model-rewarded Thinking (RLMT) for general-purpose chat capabilities." "RLMT consistently outperforms standard RLHF pipelines. This

0

11

Adithya Bhaskar

@AdithyaNLP

2 months

We release our 1) paper at https://t.co/PgMeacp3qb 2) code at https://t.co/cMLkDzzrVi 3) models, SFT data, and RL prompts at https://t.co/MEtceOmssY Thanks to my co-author @xiye_nlp and advisor @danqi_chen! 8/8

huggingface.co

0

7

Adithya Bhaskar

@AdithyaNLP

2 months

Oh, and here is the customary plot that shows that the model learns to think longer as the training progresses. We think it's cool. 7/8

1

0

4

Adithya Bhaskar

@AdithyaNLP

2 months

What kind of plans is the LM making? We checked, and it is refreshingly non-slop. It tries to cross-link different parts of the answer, carefully navigates edge cases, doesn’t just throw everything into a billion nested lists, and even refines and iterates on its draft/plan! 6/8

1

0

4

Adithya Bhaskar

@AdithyaNLP

2 months

Okay, so what matters? We found: (1) the prompt mixture matters, (2) the source of SFT responses matters less, and (3) the strength of the reward model matters a lot. 5/8

1

0

3

Adithya Bhaskar

@AdithyaNLP

2 months

The gains in chat/creative writing are huge. Our warm-start instruct models beat GPT-4o on chat/creative writing, and even Claude-3.7-Sonnet (thinking) on AE2/WB! The zero models beat instruct versions on chat/CW (and Qwen-zero even beats Instruct on other benchmarks). 4/8

1

0

3

Adithya Bhaskar

@AdithyaNLP

2 months

How can you make LMs think? You can SFT (warm-start), or you can prompt it like "A conversation between… the assistant first thinks…" ("zero"). TLDR: warm-start works with DPO/PPO/GRPO, but zero needs GRPO. In all cases, thinking outperforms nonthinking by >= 1-3 pts avg. 3/8

1

0

3

Adithya Bhaskar

@AdithyaNLP

2 months

Everyone’s training “thinking” math/science LMs, but we think for other stuff too: we outline essays, scribble shopping lists, and rehearse speeches. Doesn’t make sense that LMs can’t, so we made them. Simple recipe: prompt → thought→ response, remove thought, score w RM. 2/8

1

0

5

CLS

@ChengleiSi

5 months

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

12

194

635

Adithya Bhaskar

@AdithyaNLP

5 months

Paper also has (1) ablation & sensitivity studies (2) PruLong for pretraining (3) more idealized & real (hardware) metrics! Paper: https://t.co/D3bWZWshyn Code: https://t.co/EBM2dhzIcZ Special thanks to my coauthors @_awettig @YiheS5 @gaotianyu1350 @danqi_chen! 7/7

github.com

Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?" - princeton-pli/PruLong

0

4

Adithya Bhaskar

@AdithyaNLP

5 months

Our modifications substantially reduce the critical KV footprint needed to retain 90% performance of the two methods by up to 30 absolute percentage points, when evaluated on long -> short (HELMET) as well as long -> long benchmarks (LongProc). 6/7

1

0

3