Sumanth Hegde @sumanthrh X Profile

Sumanth Hegde

@sumanthrh

Followers

918

Following

4K

Media

67

Statuses

386

Post-training @anyscalecompute. Prev - @UCSanDiego, @C3_AI, @iitmadras. Machine Learning and Systems. Intensity is all you need.

San Francisco

Joined February 2016

Don't wanna be here? Send us removal request.

Sumanth Hegde

@sumanthrh

4 months

Some of our interesting observations from working on multi-turn text2SQL: .- Data-efficient RL works pretty well: We did very typical GRPO settings; Just make sure to use "hard-enough" samples and no KL. KL can stabilize learning early on but will always bring down rewards.

NovaSky

@NovaSkyAI

4 months

1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. 🚀 Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model

0

6

18

Sumanth Hegde

@sumanthrh

4 days

RT @shiyi_c98: An amazing collaboration with the Biomni team! We introduce Biomni-R0, a reasoning-enabled multi-task multi-tool biomedical….

0

9

0

Sumanth Hegde

@sumanthrh

15 days

2. Online quantization: Policy model runs in BF16, while rollouts run in FP8/ Int8. FlashRL patches vLLM’s weight loading APIs to ensure compatibility. We’ve ported over these patches for v0.9.2 and further optimized weight syncing! . Try it out:

0

1

Sumanth Hegde

@sumanthrh

15 days

We’ve implemented both ingredients in FlashRL:.1. Truncated Importance Sampling: Ensures that the difference in rollout and policy logprobs doesn’t hurt performance. This is a simple token-level correction factor for your policy loss. Can help stabilize training even without.

1

0

2

Sumanth Hegde

@sumanthrh

15 days

SkyRL now has native support for FlashRL!. We now support Int8 and FP8 rollouts, enabling blazing fast inference - upto 1.7x for the DAPO recipe - without compromising performance!.

Feng Yao

@fengyao1909

26 days

⚡𝐅𝐏𝟖 makes RL faster — but at the cost of performance. We present 𝐅𝐥𝐚𝐬𝐡𝐑𝐋, the first 𝐨𝐩𝐞𝐧–𝐬𝐨𝐮𝐫𝐜𝐞 & 𝐰𝐨𝐫𝐤𝐢𝐧𝐠 𝐑𝐋 𝐫𝐞𝐜𝐢𝐩𝐞 that applies 𝐈𝐍𝐓𝟖/𝐅𝐏𝟖 for rollout 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐥𝐨𝐬𝐢𝐧𝐠 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 compared to 𝐁𝐅𝟏𝟔!. 📝 Blog:

3

19

118

Sumanth Hegde

@sumanthrh

23 days

RT @eshear: I’m confused by ppl saying GPT-5 is bad. At least with thinking mode, I find it way more focused, more reasonable, and less ram….

0

29

0

Sumanth Hegde

@sumanthrh

25 days

RT @NovaSkyAI: 🥇 Happy to share that our work on SkyRL won the gold medal in "Reasoning & Planning Research Track" of the @BerkeleyRDI Agen….

rdi.berkeley.edu

AgentX is hosted by RDI at UC Berkeley.

0

3

0

Sumanth Hegde

@sumanthrh

28 days

RT @YichuanM: 1/N 🚀 Launching LEANN — the tiniest vector index on Earth!. Fast, accurate, and 100% private RAG on your MacBook. 0% internet….

0

45

0

Sumanth Hegde

@sumanthrh

29 days

Generated by GPT-5 (Thinking Disabled).

Horace He

@cHHillee

30 days

You're no match for OpenAI's marketing team.

0

2

Sumanth Hegde

@sumanthrh

29 days

RT @haoailab: (1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU!. Introducing FastWan series,….

0

110

0

Sumanth Hegde

@sumanthrh

1 month

RT @PyTorch: We’re looking forward to participating in the Agentic AI Summit 2025 by @BerkeleyRDI on August 2nd. Find us onsite to speak to….

0

15

0

Sumanth Hegde

@sumanthrh

1 month

RT @mjamei: If you are in SF tonight, come over to our event and learn about post-training of AI agents. We have experts from Veris, Anysca….

0

3

0

Sumanth Hegde

@sumanthrh

1 month

RT @vikhyatk:

0

69

0

Sumanth Hegde

@sumanthrh

1 month

🤯🤯🤯🤯🤯.

Cristóbal Valenzuela

@c_valenzuelab

1 month

Really nice demo of what @runwayml Aleph can do for complex changes in environments while adding accurate dynamic elements like snow on the shoulders or splashing water as the characters move.

0

1

Sumanth Hegde

@sumanthrh

2 months

RT @sergeykarayev: Terminal Bench is a cool benchmark I just came across!. CLI SWE agents must complete tasks like . - Build Linux kernel.-….

0

4

0

Sumanth Hegde

@sumanthrh

2 months

Doing God's work.

Epoch AI

@EpochAIResearch

2 months

Running SWE-bench evals is very slow and difficult. To solve this, we created a registry of optimized Docker images that let us run SWE-bench Verified in just one hour on a single 32-core machine. Today, we are open-sourcing these images— anyone can `docker pull` them.

0

8

Sumanth Hegde

@sumanthrh

2 months

RT @NovaSkyAI: 🔎 SkyRL + Search-R1. Training a multi-turn search agent doesn’t have to be complicated. With SkyRL, reproducing the SearchR….

0

32

0

Sumanth Hegde

@sumanthrh

2 months

RT @venturetwins: When the vibe coding is over and it's time for vibe debugging

0

478

0

Sumanth Hegde

@sumanthrh

2 months

This release is special. We've been working on multi-turn RL for the past few months - with SkyRL-Agent trained on a SweBench-like task and SkyRL-SQL on Text-to-SQL. There was always this friction point while working with existing training frameworks. Often the no. 1 priority.

NovaSky

@NovaSkyAI

2 months

✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇.Blog: Code:

0

1

18

Sumanth Hegde

@sumanthrh

2 months

RT @erictang000: Check out our updates to SkyRL!. In this release we also reproduced our prior results outperforming GPT-4o for multi-turn….

0

4

0