vijaytarian Profile Banner
Vijay V. Profile
Vijay V.

@vijaytarian

Followers
706
Following
1K
Media
101
Statuses
1K

Grad student at CMU. I do research on applied NLP. he/him

Pittsburgh, PA
Joined April 2009
Don't wanna be here? Send us removal request.
@vijaytarian
Vijay V.
4 months
RL with verifiable rewards? Works great โœจ Realistic or non-verifiable tasks? Still a mess ๐Ÿ“‰ Reward models and AI judges? Fragile and inconsistent ๐Ÿ’” Our proposal? RL from Checklist Feedback ๐Ÿ“œ https://t.co/fRYqiiDBQP ๐Ÿ‘‡
5
41
228
@abertsch72
Amanda Bertsch
10 days
Can LLMs accurately aggregate information over long, information-dense texts? Not yetโ€ฆ We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
13
66
345
@vijaytarian
Vijay V.
16 days
This shouldn't be controversial. Science requires sharing with the public. If you're never sharing your research, you're not a research scientist. I don't think you have to share it via peer review, but vague musings on Twitter or on a podcast definitely don't count as science
@jxmnop
dr. jack morris
17 days
my most controversial opinion is that you shouldnโ€™t trust anyone that calls themself an โ€œAI researcherโ€ but has never gotten a first author paper through peer review
0
0
1
@yueqi_song
Yueqi Song @ EMNLP2025
19 days
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
Tweet card summary image
arxiv.org
Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...
27
173
1K
@ZhiruoW
Zora Wang
20 days
Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine
7
52
243
@ShawnSYFeng
Shengyu Feng
28 days
Introducing ๐ƒ๐ฎ๐š๐ฅ-๐–๐ž๐ข๐ ๐ก๐ญ๐ž๐ ๐‘๐ž๐ข๐ง๐Ÿ๐จ๐ซ๐œ๐ž๐ฆ๐ž๐ง๐ญ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  (๐ƒ๐–๐‘๐‹) โ€” a new framework forย ๐‚๐จ๐“ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐จ๐ง ๐ฉ๐ซ๐ž๐Ÿ๐ž๐ซ๐ž๐ง๐œ๐ž ๐๐š๐ญ๐š, working @AIatMeta Key ideas: โŒ Transform preference modeling into RLVR and use GRPO โœ… Integrate the
9
27
129
@vijaytarian
Vijay V.
2 months
Tinker looks awesome! And since Andrej nicely highlighted the value of finetuning a smaller LLM specifically for your narrow task, this is a great time for me to mention Prompt2Model - the library we released back in 2023 to support the same goal! https://t.co/bdohmjqxCt
Tweet card summary image
github.com
prompt2model - Generate Deployable Models from Natural Language Instructions - neulab/prompt2model
@karpathy
Andrej Karpathy
2 months
Tinker is cool. If you're a researcher/developer, tinker dramatically simplifies LLM post-training. You retain 90% of algorithmic creative control (usually related to data, loss function, the algorithm) while tinker handles the hard parts that you usually want to touch much less
0
0
3
@xwang_lk
Xin Eric Wang @ EMNLP 2025
2 months
Prove me wrong: anything we didnโ€™t publish and secretly did is 10x more advanced than all the open source repos or publications.
6
4
193
@lateinteraction
Omar Khattab
2 months
If this statement is about algorithms, it misses the fact that finding an RL abstraction that is natural and ergonomic enough for *problem specification* is very difficult for most problems of interest. You canโ€™t climb a hill you donโ€™t even know how to define.
@MillionInt
Jerry Tworek
2 months
Science of RL optimization is likely humanityโ€™s last open scientific problem
7
4
89
@ZitongYang0
Zitong Yang
2 months
๐Ÿ“œ Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training โ€” no teacher needed. Validation: 1T data + 3B model from scratch.๐Ÿงต
9
49
249
@vijaytarian
Vijay V.
2 months
(see our original twitter thread for more! https://t.co/f4X0OjNdBj)
@vijaytarian
Vijay V.
4 months
RL with verifiable rewards? Works great โœจ Realistic or non-verifiable tasks? Still a mess ๐Ÿ“‰ Reward models and AI judges? Fragile and inconsistent ๐Ÿ’” Our proposal? RL from Checklist Feedback ๐Ÿ“œ https://t.co/fRYqiiDBQP ๐Ÿ‘‡
0
0
0
@vijaytarian
Vijay V.
2 months
Our software lets you generate rubrics, scoring responses via rubric-grounded LM judges and rubric-grounded code verifiers, and training models (currently using DPO) on these rewards: https://t.co/uMX4dDZlNT. You'll need a node with 4-8 GPUs to use this
Tweet card summary image
github.com
Contribute to viswavi/RLCF development by creating an account on GitHub.
1
0
3
@vijaytarian
Vijay V.
2 months
Closed-source models like Kimi K2 are post-trained using rubrics, but these aren't available open-source for researchers. To change this, we're releasing the WildChecklists dataset ( https://t.co/wTdPKBwtRl) and code ( https://t.co/SwgmhzPVdq) from our paper on checklist-based RL!
Tweet card summary image
github.com
Contribute to viswavi/RLCF development by creating an account on GitHub.
@jxmnop
dr. jack morris
3 months
for the first time i am aware of, there is an entirely private subfield of AI research every company that actually trains models is doing RL with rubrics and LLM-judged rewards but academic work is stuck on RL with automated rewards (math problems and code). much cleaner for
1
5
15
@sh_reya
Shreya Shankar
3 months
On my way to VLDB! ๐Ÿ‡ฌ๐Ÿ‡ง I am on the job market this year, seeking tenure-track CS faculty positions. I will be giving a talk on DocETL and on a panel titled โ€œWhere Does Academic Database Research Go From Here?โ€ I would love to meet folks; please reach out if youโ€™re also attending!
10
23
165
@jxmnop
dr. jack morris
3 months
OpenAI hasnโ€™t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base ๐Ÿงต
156
462
6K
@gneubig
Graham Neubig
3 months
@Yuchenj_UW They didn't evaluate on 23 of the 500 instances though, so the actual score is: 74.9 * (500 - 23) / 500 = 71.4%, which is a few points below Claude Sonnet 4.
13
28
387
@alisawuffles
Alisa Liu
4 months
If you're at ACL, join us for our tutorial on Synthetic Data in the Era of LLMs with @vijaytarian @xiangyue96 @yizhongwyz @gneubig!! ๐Ÿ•‘ 2pm - 5:30pm ๐Ÿ“ Hall B
4
14
121
@vijaytarian
Vijay V.
4 months
Bring questions, stick around, and chat with us after! Details + sneak peek at our slides and content: https://t.co/6DC6VIg28K Hope to see you there!
0
2
6
@vijaytarian
Vijay V.
4 months
Weโ€™ll introduce the core abstractions and algorithms of synthetic data - what are the building blocks, how to combine them, and how they show up in real LLM pipelines. @yizhongwyz has also prepared a segment on open questions in synthetic data that I can't wait to hear
1
0
4
@vijaytarian
Vijay V.
4 months
Weโ€™ve prepared a tutorial for ACL in Vienna this year to give you some answers. Come join @xiangyue96, @alisawuffles, @yizhongwyz, @gneubig, and me for โ€œSynthetic Data in the Era of LLMs.โ€ ๐Ÿ“ Sunday 2โ€“3:30pm Hall B #ACL2025
6
0
5