Vijay V. @vijaytarian X Profile

Vijay V.

@vijaytarian

Followers

706

Following

1K

Media

101

Statuses

1K

Grad student at CMU. I do research on applied NLP. he/him

https://t.co/TR3NFwE5Q1

Pittsburgh, PA

Joined April 2009

Don't wanna be here? Send us removal request.

Vijay V.

@vijaytarian

4 months

RL with verifiable rewards? Works great ✨ Realistic or non-verifiable tasks? Still a mess 📉 Reward models and AI judges? Fragile and inconsistent 💔 Our proposal? RL from Checklist Feedback 📜 https://t.co/fRYqiiDBQP 👇

5

41

228

Amanda Bertsch

@abertsch72

10 days

Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

13

66

345

Vijay V.

@vijaytarian

16 days

This shouldn't be controversial. Science requires sharing with the public. If you're never sharing your research, you're not a research scientist. I don't think you have to share it via peer review, but vague musings on Twitter or on a podcast definitely don't count as science

dr. jack morris

@jxmnop

17 days

my most controversial opinion is that you shouldn’t trust anyone that calls themself an “AI researcher” but has never gotten a first author paper through peer review

0

1

Yueqi Song @ EMNLP2025

@yueqi_song

19 days

We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.

arxiv.org

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...

27

173

1K

Zora Wang

@ZhiruoW

20 days

Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine

7

52

243

Shengyu Feng

@ShawnSYFeng

28 days

Introducing 𝐃𝐮𝐚𝐥-𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐑𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 (𝐃𝐖𝐑𝐋) — a new framework for 𝐂𝐨𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐨𝐧 𝐩𝐫𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐝𝐚𝐭𝐚, working @AIatMeta Key ideas: ❌ Transform preference modeling into RLVR and use GRPO ✅ Integrate the

9

27

129

Vijay V.

@vijaytarian

2 months

Tinker looks awesome! And since Andrej nicely highlighted the value of finetuning a smaller LLM specifically for your narrow task, this is a great time for me to mention Prompt2Model - the library we released back in 2023 to support the same goal! https://t.co/bdohmjqxCt

github.com

prompt2model - Generate Deployable Models from Natural Language Instructions - neulab/prompt2model

Andrej Karpathy

@karpathy

2 months

Tinker is cool. If you're a researcher/developer, tinker dramatically simplifies LLM post-training. You retain 90% of algorithmic creative control (usually related to data, loss function, the algorithm) while tinker handles the hard parts that you usually want to touch much less

0

3

Xin Eric Wang @ EMNLP 2025

@xwang_lk

2 months

Prove me wrong: anything we didn’t publish and secretly did is 10x more advanced than all the open source repos or publications.

6

4

193

Omar Khattab

@lateinteraction

2 months

If this statement is about algorithms, it misses the fact that finding an RL abstraction that is natural and ergonomic enough for *problem specification* is very difficult for most problems of interest. You can’t climb a hill you don’t even know how to define.

Jerry Tworek

@MillionInt

2 months

Science of RL optimization is likely humanity’s last open scientific problem

7

4

89

Zitong Yang

@ZitongYang0

2 months

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵

9

49

249

Vijay V.

@vijaytarian

2 months

(see our original twitter thread for more! https://t.co/f4X0OjNdBj)

Vijay V.

@vijaytarian

4 months

RL with verifiable rewards? Works great ✨ Realistic or non-verifiable tasks? Still a mess 📉 Reward models and AI judges? Fragile and inconsistent 💔 Our proposal? RL from Checklist Feedback 📜 https://t.co/fRYqiiDBQP 👇

0

Vijay V.

@vijaytarian

2 months

In https://t.co/N5sHpY7qBc we showed that this exact approach can make already-strong models even better at instruction following.

arxiv.org

Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this -- typically using fixed criteria such as "helpfulness" and...

1

0

4

Vijay V.

@vijaytarian

2 months

Our software lets you generate rubrics, scoring responses via rubric-grounded LM judges and rubric-grounded code verifiers, and training models (currently using DPO) on these rewards: https://t.co/uMX4dDZlNT. You'll need a node with 4-8 GPUs to use this

github.com

Contribute to viswavi/RLCF development by creating an account on GitHub.

1

0

3

Vijay V.

@vijaytarian

2 months

Closed-source models like Kimi K2 are post-trained using rubrics, but these aren't available open-source for researchers. To change this, we're releasing the WildChecklists dataset ( https://t.co/wTdPKBwtRl) and code ( https://t.co/SwgmhzPVdq) from our paper on checklist-based RL!

github.com

Contribute to viswavi/RLCF development by creating an account on GitHub.

dr. jack morris

@jxmnop

3 months

for the first time i am aware of, there is an entirely private subfield of AI research every company that actually trains models is doing RL with rubrics and LLM-judged rewards but academic work is stuck on RL with automated rewards (math problems and code). much cleaner for

1

5

15

Shreya Shankar

@sh_reya

3 months

On my way to VLDB! 🇬🇧 I am on the job market this year, seeking tenure-track CS faculty positions. I will be giving a talk on DocETL and on a panel titled “Where Does Academic Database Research Go From Here?” I would love to meet folks; please reach out if you’re also attending!

10

23

165

dr. jack morris

@jxmnop

3 months

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵

156

462

6K

Graham Neubig

@gneubig

3 months

@Yuchenj_UW They didn't evaluate on 23 of the 500 instances though, so the actual score is: 74.9 * (500 - 23) / 500 = 71.4%, which is a few points below Claude Sonnet 4.

13

28

387

Alisa Liu

@alisawuffles

4 months

If you're at ACL, join us for our tutorial on Synthetic Data in the Era of LLMs with @vijaytarian @xiangyue96 @yizhongwyz @gneubig!! 🕑 2pm - 5:30pm 📍 Hall B

4

14

121

Vijay V.

@vijaytarian

4 months

Bring questions, stick around, and chat with us after! Details + sneak peek at our slides and content: https://t.co/6DC6VIg28K Hope to see you there!

0

2

6

Vijay V.

@vijaytarian

4 months

We’ll introduce the core abstractions and algorithms of synthetic data - what are the building blocks, how to combine them, and how they show up in real LLM pipelines. @yizhongwyz has also prepared a segment on open questions in synthetic data that I can't wait to hear

1

0

4

Vijay V.

@vijaytarian

4 months

We’ve prepared a tutorial for ACL in Vienna this year to give you some answers. Come join @xiangyue96, @alisawuffles, @yizhongwyz, @gneubig, and me for “Synthetic Data in the Era of LLMs.” 📍 Sunday 2–3:30pm Hall B #ACL2025

6

0

5