Vaskar Nath @vaskar_n X Profile

Vaskar Nath

@vaskar_n

Followers

65

Following

34

Media

0

Statuses

17

Researcher @ Scale AI

New York City

Joined July 2024

Don't wanna be here? Send us removal request.

Anisha Gunjal

@anisha_gunjal

5 months

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer? Our work at @scale_AI introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵

5

43

245

Matt Schlicht

@MattPRD

5 months

Waking up to see this new paper from @scale_AI charting on the @yesnoerror trending feed. Authors: @anisha_gunjal, @aytwang, Elaine Lau, @vaskar_n, @BingLiu1011, and @SeanHendryx "Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains" Simplified: Teaching

9

22

76

Sean Hendryx

@SeanHendryx

5 months

@karpathy a neat quality specific to language models is that you can just tell them what to do differently when they fail. And if you use importance sampling, gradients are aligned with the unguided context and it gets into the weights directly. No sleep needed https://t.co/qJ2Qv43rYp

Sean Hendryx

@SeanHendryx

6 months

For online RL, we introduce Guide, a class of algorithms which incorporate guidance into the model’s context when all rollouts fail and adjusts the importance sampling ratio in order to optimize the policy for contexts in which guidance is no longer present.

0

1

5

Sean Hendryx

@SeanHendryx

6 months

What will the learning environments of the future look like that train artificial super intelligence? In recent work at @scale_AI , we show that training systems that combine verifiable rewards with multi-agent interaction accelerate learning.

12

30

129

Mohit

@mohit_r9a

9 months

Over the last year, I have worked on data curation and scaling to effectively improve performance through SFT and RLHF. Check out the blog post I wrote, detailing my findings. https://t.co/5nFl6Flzbd (Thank you @natolambert for the shoutout in the latest Interconnects post!)

mohit-raghavendra.notion.site

- Mohit Raghavendra

1

2

3

Alexandr Wang

@alexandr_wang

9 months

GPT-4.5 Preview evals results are out on SEAL 👀 ⚡ #2 in Tool Use - Chat 🏢 #3 in Tool Use - Enterprise 🥉 #3 in EnigmaEval (behind Claude 3.7 Sonnet) 📚 #4 in MultiChallenge 🎓 #5 in Humanity’s Last Exam 🔍 #6 in VISTA (multimodal) See rankings here: https://t.co/pVIgk6rIcL

25

16

228

Summer Yue

@summeryue0

9 months

GPT-4.5 Preview Just Dropped~ We put it to the test, and the results are... mixed 👀 ⚡ #2 in Tool Use - Chat (trailing o1) 🏢 #3 in Tool Use - Enterprise (coming after Claude 3.7 Sonnet) 🥉 #3 in EnigmaEval (following Claude 3.7 Sonnet Thinking) 📚 #4 in MultiChallenge (behind

0

4

12

Sean Hendryx

@SeanHendryx

1 year

If you’ve ever finetuned a pretrained language model on a reasoning task at the edge of its capabilities, you were probably skeptical of the superficial alignment hypothesis. Turns out you were right. 1/🤔

8

45

264

Scale AI

@scale_AI

1 year

Read the full paper here from authors @mohit_r18 @vaskar_n @SeanHendryx:

0

1

5

Scale AI

@scale_AI

1 year

Contrary to prior work, new research from Scale finds that LLMs continue to learn new knowledge during post-training following a power law similar to well known pre-training scaling laws 🧵 https://t.co/aR03YQuJ3u

3

7

23

Hugh Zhang

@hughbzhang

1 year

Enabling LLMs to reason more deeply at inference time via search is one of the most exciting directions in AI right now. We introduce PlanSearch, a novel method for code generation that searches over high-level "plans" in natural language as a means of encouraging diversity.

16

99

635

Sean Hendryx

@SeanHendryx

1 year

Reasoning at length will be a key part of LLMs solving more challenging problems, but how can we make sure that their chain of thought stays on track? At @scale_AI, we’ve developed a method to learn token-wise expected rewards from pairwise preference labels 🧵

3

6

18

Scale AI

@scale_AI

1 year

Our paper on this work, “Learning Goal-Conditioned Representations for Language Reward Models,” by @vaskar_n, @dylanslack20, @_jeffda, @TommyMa9, @hughbzhang, Spencer Whitehead, and @SeanHendryx will be presented at NeurIPS 2024 main track — we hope to see you there!

arxiv.org

Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear...

1

5

Scale AI

@scale_AI

1 year

Our researchers at Scale have developed a novel method to evaluate LLM output during generation instead of waiting until it’s complete — like a GPS recalculating when you go off route, before you’re at the wrong place. Learn more on the Scale blog: https://t.co/ktuWKzrSrG

2

5

32

Riley Goodside

@goodside

1 year

New research from Scale — detecting problems in LLM outputs before it's too late. This work will be presented at NeurIPS 2024 main track — congrats @vaskar_n, @dylanslack20, @_jeffda, @TommyMa9, @hughbzhang, Spencer Whitehead, and @SeanHendryx!

Scale AI

@scale_AI

1 year

Our researchers at Scale have developed a novel method to evaluate LLM output during generation instead of waiting until it’s complete — like a GPS recalculating when you go off route, before you’re at the wrong place. Learn more on the Scale blog: https://t.co/ktuWKzrSrG

0

4

32

Iskander Azangulov

@IAzangulov

1 year

With @PPotaptchik and @GeorgeDeligian9 we show the first realistic bound on the iteration complexity of diffusion models! Our work explains why sampling from ImageNet needs only ~100 (intrinsic dim) steps instead of ~150k (extrinsic dim). https://t.co/ixKB6Lj2sL

1

13

43

Vaskar Nath

@vaskar_n

1 year

Excited to see the results of ToolComp drop today! It’s been incredible to be part of the work that helps advance tool-use capabilities in AI models. Amazing to see the progress across the board—congrats to everyone involved! 🚀🛠️🤖

Sean Hendryx

@SeanHendryx

1 year

We’re releasing the results on ToolComp today, a Scale AI SEAL leaderboard that tests the ability of agents to plan, reason, and compose multiple, dependent tool calls together. OpenAI models lead with Claude showing strong performance in the Chat setting. 1/🛠️🤖

0

2