Vaskar Nath Profile
Vaskar Nath

@vaskar_n

Followers
65
Following
34
Media
0
Statuses
17

Researcher @ Scale AI

New York City
Joined July 2024
Don't wanna be here? Send us removal request.
@anisha_gunjal
Anisha Gunjal
5 months
🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer? Our work at @scale_AI introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵
5
43
245
@MattPRD
Matt Schlicht
5 months
Waking up to see this new paper from @scale_AI charting on the @yesnoerror trending feed. Authors: @anisha_gunjal, @aytwang, Elaine Lau, @vaskar_n, @BingLiu1011, and @SeanHendryx "Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains" Simplified: Teaching
9
22
76
@SeanHendryx
Sean Hendryx
5 months
@karpathy a neat quality specific to language models is that you can just tell them what to do differently when they fail. And if you use importance sampling, gradients are aligned with the unguided context and it gets into the weights directly. No sleep needed https://t.co/qJ2Qv43rYp
@SeanHendryx
Sean Hendryx
6 months
For online RL, we introduce Guide, a class of algorithms which incorporate guidance into the model’s context when all rollouts fail and adjusts the importance sampling ratio in order to optimize the policy for contexts in which guidance is no longer present.
0
1
5
@SeanHendryx
Sean Hendryx
6 months
What will the learning environments of the future look like that train artificial super intelligence? In recent work at @scale_AI , we show that training systems that combine verifiable rewards with multi-agent interaction accelerate learning.
12
30
129
@mohit_r9a
Mohit
9 months
Over the last year, I have worked on data curation and scaling to effectively improve performance through SFT and RLHF. Check out the blog post I wrote, detailing my findings. https://t.co/5nFl6Flzbd (Thank you @natolambert for the shoutout in the latest Interconnects post!)
Tweet card summary image
mohit-raghavendra.notion.site
- Mohit Raghavendra
1
2
3
@alexandr_wang
Alexandr Wang
9 months
GPT-4.5 Preview evals results are out on SEAL 👀 ⚡ #2 in Tool Use - Chat 🏢 #3 in Tool Use - Enterprise 🥉 #3 in EnigmaEval (behind Claude 3.7 Sonnet) 📚 #4 in MultiChallenge 🎓 #5 in Humanity’s Last Exam 🔍 #6 in VISTA (multimodal) See rankings here: https://t.co/pVIgk6rIcL
25
16
228
@summeryue0
Summer Yue
9 months
GPT-4.5 Preview Just Dropped~ We put it to the test, and the results are... mixed 👀 ⚡ #2 in Tool Use - Chat (trailing o1) 🏢 #3 in Tool Use - Enterprise (coming after Claude 3.7 Sonnet) 🥉 #3 in EnigmaEval (following Claude 3.7 Sonnet Thinking) 📚 #4 in MultiChallenge (behind
0
4
12
@SeanHendryx
Sean Hendryx
1 year
If you’ve ever finetuned a pretrained language model on a reasoning task at the edge of its capabilities, you were probably skeptical of the superficial alignment hypothesis. Turns out you were right. 1/🤔
8
45
264
@scale_AI
Scale AI
1 year
Read the full paper here from authors @mohit_r18 @vaskar_n @SeanHendryx:
0
1
5
@scale_AI
Scale AI
1 year
Contrary to prior work, new research from Scale finds that LLMs continue to learn new knowledge during post-training following a power law similar to well known pre-training scaling laws đź§µ https://t.co/aR03YQuJ3u
3
7
23
@hughbzhang
Hugh Zhang
1 year
Enabling LLMs to reason more deeply at inference time via search is one of the most exciting directions in AI right now. We introduce PlanSearch, a novel method for code generation that searches over high-level "plans" in natural language as a means of encouraging diversity.
16
99
635
@SeanHendryx
Sean Hendryx
1 year
Reasoning at length will be a key part of LLMs solving more challenging problems, but how can we make sure that their chain of thought stays on track? At @scale_AI, we’ve developed a method to learn token-wise expected rewards from pairwise preference labels 🧵
3
6
18
@scale_AI
Scale AI
1 year
Our paper on this work, “Learning Goal-Conditioned Representations for Language Reward Models,” by @vaskar_n, @dylanslack20, @_jeffda, @TommyMa9, @hughbzhang, Spencer Whitehead, and @SeanHendryx will be presented at NeurIPS 2024 main track — we hope to see you there!
Tweet card summary image
arxiv.org
Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear...
1
1
5
@scale_AI
Scale AI
1 year
Our researchers at Scale have developed a novel method to evaluate LLM output during generation instead of waiting until it’s complete — like a GPS recalculating when you go off route, before you’re at the wrong place. Learn more on the Scale blog: https://t.co/ktuWKzrSrG
2
5
32
@goodside
Riley Goodside
1 year
New research from Scale — detecting problems in LLM outputs before it's too late. This work will be presented at NeurIPS 2024 main track — congrats @vaskar_n, @dylanslack20, @_jeffda, @TommyMa9, @hughbzhang, Spencer Whitehead, and @SeanHendryx!
@scale_AI
Scale AI
1 year
Our researchers at Scale have developed a novel method to evaluate LLM output during generation instead of waiting until it’s complete — like a GPS recalculating when you go off route, before you’re at the wrong place. Learn more on the Scale blog: https://t.co/ktuWKzrSrG
0
4
32
@IAzangulov
Iskander Azangulov
1 year
With @PPotaptchik and @GeorgeDeligian9 we show the first realistic bound on the iteration complexity of diffusion models! Our work explains why sampling from ImageNet needs only ~100 (intrinsic dim) steps instead of ~150k (extrinsic dim). https://t.co/ixKB6Lj2sL
1
13
43
@vaskar_n
Vaskar Nath
1 year
Excited to see the results of ToolComp drop today! It’s been incredible to be part of the work that helps advance tool-use capabilities in AI models. Amazing to see the progress across the board—congrats to everyone involved! 🚀🛠️🤖
@SeanHendryx
Sean Hendryx
1 year
We’re releasing the results on ToolComp today, a Scale AI SEAL leaderboard that tests the ability of agents to plan, reason, and compose multiple, dependent tool calls together. OpenAI models lead with Claude showing strong performance in the Chat setting. 1/🛠️🤖
0
0
2