Vijay V.
@vijaytarian
Followers
706
Following
1K
Media
101
Statuses
1K
Grad student at CMU. I do research on applied NLP. he/him
Pittsburgh, PA
Joined April 2009
RL with verifiable rewards? Works great โจ Realistic or non-verifiable tasks? Still a mess ๐ Reward models and AI judges? Fragile and inconsistent ๐ Our proposal? RL from Checklist Feedback ๐ https://t.co/fRYqiiDBQP ๐
5
41
228
Can LLMs accurately aggregate information over long, information-dense texts? Not yetโฆ We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
13
66
345
This shouldn't be controversial. Science requires sharing with the public. If you're never sharing your research, you're not a research scientist. I don't think you have to share it via peer review, but vague musings on Twitter or on a podcast definitely don't count as science
my most controversial opinion is that you shouldnโt trust anyone that calls themself an โAI researcherโ but has never gotten a first author paper through peer review
0
0
1
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
arxiv.org
Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...
27
173
1K
Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine
7
52
243
Introducing ๐๐ฎ๐๐ฅ-๐๐๐ข๐ ๐ก๐ญ๐๐ ๐๐๐ข๐ง๐๐จ๐ซ๐๐๐ฆ๐๐ง๐ญ ๐๐๐๐ซ๐ง๐ข๐ง๐ (๐๐๐๐) โ a new framework forย ๐๐จ๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ ๐จ๐ง ๐ฉ๐ซ๐๐๐๐ซ๐๐ง๐๐ ๐๐๐ญ๐, working @AIatMeta Key ideas: โ Transform preference modeling into RLVR and use GRPO โ
Integrate the
9
27
129
Tinker looks awesome! And since Andrej nicely highlighted the value of finetuning a smaller LLM specifically for your narrow task, this is a great time for me to mention Prompt2Model - the library we released back in 2023 to support the same goal! https://t.co/bdohmjqxCt
github.com
prompt2model - Generate Deployable Models from Natural Language Instructions - neulab/prompt2model
Tinker is cool. If you're a researcher/developer, tinker dramatically simplifies LLM post-training. You retain 90% of algorithmic creative control (usually related to data, loss function, the algorithm) while tinker handles the hard parts that you usually want to touch much less
0
0
3
Prove me wrong: anything we didnโt publish and secretly did is 10x more advanced than all the open source repos or publications.
6
4
193
If this statement is about algorithms, it misses the fact that finding an RL abstraction that is natural and ergonomic enough for *problem specification* is very difficult for most problems of interest. You canโt climb a hill you donโt even know how to define.
7
4
89
๐ Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training โ no teacher needed. Validation: 1T data + 3B model from scratch.๐งต
9
49
249
(see our original twitter thread for more! https://t.co/f4X0OjNdBj)
RL with verifiable rewards? Works great โจ Realistic or non-verifiable tasks? Still a mess ๐ Reward models and AI judges? Fragile and inconsistent ๐ Our proposal? RL from Checklist Feedback ๐ https://t.co/fRYqiiDBQP ๐
0
0
0
In https://t.co/N5sHpY7qBc we showed that this exact approach can make already-strong models even better at instruction following.
arxiv.org
Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this -- typically using fixed criteria such as "helpfulness" and...
1
0
4
Our software lets you generate rubrics, scoring responses via rubric-grounded LM judges and rubric-grounded code verifiers, and training models (currently using DPO) on these rewards: https://t.co/uMX4dDZlNT. You'll need a node with 4-8 GPUs to use this
github.com
Contribute to viswavi/RLCF development by creating an account on GitHub.
1
0
3
Closed-source models like Kimi K2 are post-trained using rubrics, but these aren't available open-source for researchers. To change this, we're releasing the WildChecklists dataset ( https://t.co/wTdPKBwtRl) and code ( https://t.co/SwgmhzPVdq) from our paper on checklist-based RL!
github.com
Contribute to viswavi/RLCF development by creating an account on GitHub.
for the first time i am aware of, there is an entirely private subfield of AI research every company that actually trains models is doing RL with rubrics and LLM-judged rewards but academic work is stuck on RL with automated rewards (math problems and code). much cleaner for
1
5
15
On my way to VLDB! ๐ฌ๐ง I am on the job market this year, seeking tenure-track CS faculty positions. I will be giving a talk on DocETL and on a panel titled โWhere Does Academic Database Research Go From Here?โ I would love to meet folks; please reach out if youโre also attending!
10
23
165
OpenAI hasnโt open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base ๐งต
156
462
6K
@Yuchenj_UW They didn't evaluate on 23 of the 500 instances though, so the actual score is: 74.9 * (500 - 23) / 500 = 71.4%, which is a few points below Claude Sonnet 4.
13
28
387
If you're at ACL, join us for our tutorial on Synthetic Data in the Era of LLMs with @vijaytarian @xiangyue96 @yizhongwyz @gneubig!! ๐ 2pm - 5:30pm ๐ Hall B
4
14
121
Bring questions, stick around, and chat with us after! Details + sneak peek at our slides and content: https://t.co/6DC6VIg28K Hope to see you there!
0
2
6
Weโll introduce the core abstractions and algorithms of synthetic data - what are the building blocks, how to combine them, and how they show up in real LLM pipelines. @yizhongwyz has also prepared a segment on open questions in synthetic data that I can't wait to hear
1
0
4
Weโve prepared a tutorial for ACL in Vienna this year to give you some answers. Come join @xiangyue96, @alisawuffles, @yizhongwyz, @gneubig, and me for โSynthetic Data in the Era of LLMs.โ ๐ Sunday 2โ3:30pm Hall B #ACL2025
6
0
5