sumanthrh Profile Banner
Sumanth Hegde Profile
Sumanth Hegde

@sumanthrh

Followers
918
Following
4K
Media
67
Statuses
386

Post-training @anyscalecompute. Prev - @UCSanDiego, @C3_AI, @iitmadras. Machine Learning and Systems. Intensity is all you need.

San Francisco
Joined February 2016
Don't wanna be here? Send us removal request.
@sumanthrh
Sumanth Hegde
4 months
Some of our interesting observations from working on multi-turn text2SQL: .- Data-efficient RL works pretty well: We did very typical GRPO settings; Just make sure to use "hard-enough" samples and no KL. KL can stabilize learning early on but will always bring down rewards.
@NovaSkyAI
NovaSky
4 months
1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. ๐Ÿš€ Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model
Tweet media one
0
6
18
@sumanthrh
Sumanth Hegde
4 days
RT @shiyi_c98: An amazing collaboration with the Biomni team! We introduce Biomni-R0, a reasoning-enabled multi-task multi-tool biomedicalโ€ฆ.
0
9
0
@sumanthrh
Sumanth Hegde
15 days
2. Online quantization: Policy model runs in BF16, while rollouts run in FP8/ Int8. FlashRL patches vLLMโ€™s weight loading APIs to ensure compatibility. Weโ€™ve ported over these patches for v0.9.2 and further optimized weight syncing! . Try it out:
0
0
1
@sumanthrh
Sumanth Hegde
15 days
Weโ€™ve implemented both ingredients in FlashRL:.1. Truncated Importance Sampling: Ensures that the difference in rollout and policy logprobs doesnโ€™t hurt performance. This is a simple token-level correction factor for your policy loss. Can help stabilize training even without.
1
0
2
@sumanthrh
Sumanth Hegde
15 days
SkyRL now has native support for FlashRL!. We now support Int8 and FP8 rollouts, enabling blazing fast inference - upto 1.7x for the DAPO recipe - without compromising performance!.
@fengyao1909
Feng Yao
26 days
โšก๐…๐๐Ÿ– makes RL faster โ€” but at the cost of performance. We present ๐…๐ฅ๐š๐ฌ๐ก๐‘๐‹, the first ๐จ๐ฉ๐ž๐งโ€“๐ฌ๐จ๐ฎ๐ซ๐œ๐ž & ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐  ๐‘๐‹ ๐ซ๐ž๐œ๐ข๐ฉ๐ž that applies ๐ˆ๐๐“๐Ÿ–/๐…๐๐Ÿ– for rollout ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ฅ๐จ๐ฌ๐ข๐ง๐  ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐š๐ง๐œ๐ž compared to ๐๐…๐Ÿ๐Ÿ”!. ๐Ÿ“ Blog:
Tweet media one
3
19
118
@sumanthrh
Sumanth Hegde
23 days
RT @eshear: Iโ€™m confused by ppl saying GPT-5 is bad. At least with thinking mode, I find it way more focused, more reasonable, and less ramโ€ฆ.
0
29
0
@sumanthrh
Sumanth Hegde
25 days
RT @NovaSkyAI: ๐Ÿฅ‡ Happy to share that our work on SkyRL won the gold medal in "Reasoning & Planning Research Track" of the @BerkeleyRDI Agenโ€ฆ.
Tweet card summary image
rdi.berkeley.edu
AgentX is hosted by RDI at UC Berkeley.
0
3
0
@sumanthrh
Sumanth Hegde
28 days
RT @YichuanM: 1/N ๐Ÿš€ Launching LEANN โ€” the tiniest vector index on Earth!. Fast, accurate, and 100% private RAG on your MacBook. 0% internetโ€ฆ.
0
45
0
@sumanthrh
Sumanth Hegde
29 days
Generated by GPT-5 (Thinking Disabled).
@cHHillee
Horace He
30 days
You're no match for OpenAI's marketing team.
Tweet media one
0
0
2
@sumanthrh
Sumanth Hegde
29 days
RT @haoailab: (1/n) ๐Ÿš€ With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU!. Introducing FastWan series,โ€ฆ.
0
110
0
@sumanthrh
Sumanth Hegde
1 month
RT @PyTorch: Weโ€™re looking forward to participating in the Agentic AI Summit 2025 by @BerkeleyRDI on August 2nd. Find us onsite to speak toโ€ฆ.
0
15
0
@sumanthrh
Sumanth Hegde
1 month
RT @mjamei: If you are in SF tonight, come over to our event and learn about post-training of AI agents. We have experts from Veris, Anyscaโ€ฆ.
0
3
0
@sumanthrh
Sumanth Hegde
1 month
RT @vikhyatk:
Tweet media one
0
69
0
@sumanthrh
Sumanth Hegde
1 month
๐Ÿคฏ๐Ÿคฏ๐Ÿคฏ๐Ÿคฏ๐Ÿคฏ.
@c_valenzuelab
Cristรณbal Valenzuela
1 month
Really nice demo of what @runwayml Aleph can do for complex changes in environments while adding accurate dynamic elements like snow on the shoulders or splashing water as the characters move.
0
0
1
@sumanthrh
Sumanth Hegde
2 months
RT @sergeykarayev: Terminal Bench is a cool benchmark I just came across!. CLI SWE agents must complete tasks like . - Build Linux kernel.-โ€ฆ.
0
4
0
@sumanthrh
Sumanth Hegde
2 months
Doing God's work.
@EpochAIResearch
Epoch AI
2 months
Running SWE-bench evals is very slow and difficult. To solve this, we created a registry of optimized Docker images that let us run SWE-bench Verified in just one hour on a single 32-core machine. Today, we are open-sourcing these imagesโ€” anyone can `docker pull` them.
Tweet media one
0
0
8
@sumanthrh
Sumanth Hegde
2 months
RT @NovaSkyAI: ๐Ÿ”Ž SkyRL + Search-R1. Training a multi-turn search agent doesnโ€™t have to be complicated. With SkyRL, reproducing the SearchRโ€ฆ.
0
32
0
@sumanthrh
Sumanth Hegde
2 months
RT @venturetwins: When the vibe coding is over and it's time for vibe debugging
0
478
0
@sumanthrh
Sumanth Hegde
2 months
This release is special. We've been working on multi-turn RL for the past few months - with SkyRL-Agent trained on a SweBench-like task and SkyRL-SQL on Text-to-SQL. There was always this friction point while working with existing training frameworks. Often the no. 1 priority.
@NovaSkyAI
NovaSky
2 months
โœจRelease: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularityโ€”easily prototype new algorithms, environments, and training logic with minimal overhead. ๐Ÿงต๐Ÿ‘‡.Blog: Code:
Tweet media one
0
1
18
@sumanthrh
Sumanth Hegde
2 months
RT @erictang000: Check out our updates to SkyRL!. In this release we also reproduced our prior results outperforming GPT-4o for multi-turnโ€ฆ.
0
4
0