
Sumanth Hegde
@sumanthrh
Followers
918
Following
4K
Media
67
Statuses
386
Post-training @anyscalecompute. Prev - @UCSanDiego, @C3_AI, @iitmadras. Machine Learning and Systems. Intensity is all you need.
San Francisco
Joined February 2016
Some of our interesting observations from working on multi-turn text2SQL: .- Data-efficient RL works pretty well: We did very typical GRPO settings; Just make sure to use "hard-enough" samples and no KL. KL can stabilize learning early on but will always bring down rewards.
1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. ๐ Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model
0
6
18
RT @shiyi_c98: An amazing collaboration with the Biomni team! We introduce Biomni-R0, a reasoning-enabled multi-task multi-tool biomedicalโฆ.
0
9
0
2. Online quantization: Policy model runs in BF16, while rollouts run in FP8/ Int8. FlashRL patches vLLMโs weight loading APIs to ensure compatibility. Weโve ported over these patches for v0.9.2 and further optimized weight syncing! . Try it out:
0
0
1
Weโve implemented both ingredients in FlashRL:.1. Truncated Importance Sampling: Ensures that the difference in rollout and policy logprobs doesnโt hurt performance. This is a simple token-level correction factor for your policy loss. Can help stabilize training even without.
1
0
2
SkyRL now has native support for FlashRL!. We now support Int8 and FP8 rollouts, enabling blazing fast inference - upto 1.7x for the DAPO recipe - without compromising performance!.
โก๐
๐๐ makes RL faster โ but at the cost of performance. We present ๐
๐ฅ๐๐ฌ๐ก๐๐, the first ๐จ๐ฉ๐๐งโ๐ฌ๐จ๐ฎ๐ซ๐๐ & ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐ ๐๐ ๐ซ๐๐๐ข๐ฉ๐ that applies ๐๐๐๐/๐
๐๐ for rollout ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ฅ๐จ๐ฌ๐ข๐ง๐ ๐ฉ๐๐ซ๐๐จ๐ซ๐ฆ๐๐ง๐๐ compared to ๐๐
๐๐!. ๐ Blog:
3
19
118
RT @eshear: Iโm confused by ppl saying GPT-5 is bad. At least with thinking mode, I find it way more focused, more reasonable, and less ramโฆ.
0
29
0
RT @NovaSkyAI: ๐ฅ Happy to share that our work on SkyRL won the gold medal in "Reasoning & Planning Research Track" of the @BerkeleyRDI Agenโฆ.
rdi.berkeley.edu
AgentX is hosted by RDI at UC Berkeley.
0
3
0
RT @YichuanM: 1/N ๐ Launching LEANN โ the tiniest vector index on Earth!. Fast, accurate, and 100% private RAG on your MacBook. 0% internetโฆ.
0
45
0
Generated by GPT-5 (Thinking Disabled).
0
0
2
RT @haoailab: (1/n) ๐ With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU!. Introducing FastWan series,โฆ.
0
110
0
RT @PyTorch: Weโre looking forward to participating in the Agentic AI Summit 2025 by @BerkeleyRDI on August 2nd. Find us onsite to speak toโฆ.
0
15
0
RT @mjamei: If you are in SF tonight, come over to our event and learn about post-training of AI agents. We have experts from Veris, Anyscaโฆ.
0
3
0
๐คฏ๐คฏ๐คฏ๐คฏ๐คฏ.
Really nice demo of what @runwayml Aleph can do for complex changes in environments while adding accurate dynamic elements like snow on the shoulders or splashing water as the characters move.
0
0
1
RT @sergeykarayev: Terminal Bench is a cool benchmark I just came across!. CLI SWE agents must complete tasks like . - Build Linux kernel.-โฆ.
0
4
0
RT @NovaSkyAI: ๐ SkyRL + Search-R1. Training a multi-turn search agent doesnโt have to be complicated. With SkyRL, reproducing the SearchRโฆ.
0
32
0
This release is special. We've been working on multi-turn RL for the past few months - with SkyRL-Agent trained on a SweBench-like task and SkyRL-SQL on Text-to-SQL. There was always this friction point while working with existing training frameworks. Often the no. 1 priority.
โจRelease: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularityโeasily prototype new algorithms, environments, and training logic with minimal overhead. ๐งต๐.Blog: Code:
0
1
18
RT @erictang000: Check out our updates to SkyRL!. In this release we also reproduced our prior results outperforming GPT-4o for multi-turnโฆ.
0
4
0