sumanthrh Profile Banner
Sumanth Hegde Profile
Sumanth Hegde

@sumanthrh

Followers
943
Following
4K
Media
67
Statuses
403

Post-training @anyscalecompute. Prev - @UCSanDiego, @C3_AI, @iitmadras. Machine Learning and Systems. Intensity is all you need.

San Francisco
Joined February 2016
Don't wanna be here? Send us removal request.
@sumanthrh
Sumanth Hegde
7 months
Some of our interesting observations from working on multi-turn text2SQL: - Data-efficient RL works pretty well: We did very typical GRPO settings; Just make sure to use "hard-enough" samples and no KL. KL can stabilize learning early on but will always bring down rewards
@NovaSkyAI
NovaSky
7 months
1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. ๐Ÿš€ Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model
0
5
18
@seiji_________
Seiji Eicher
27 days
Wide-EP and prefill/decode disaggregation APIs for vLLM are now available in Ray 2.52 ๐Ÿš€๐Ÿš€ Validated at 2.4k tokens/H200 on Anyscale Runtime, these patterns maximize sparse MoE model inference efficiency, but often require non-trivial orchestration logic. Hereโ€™s how they
1
15
27
@shiyi_c98
Shiyi Cao
27 days
1/n ๐Ÿš€ Introducing SkyRL-Agent, a framework for efficient RL agent training. โšก 1.55ร— faster async rollout dispatch ๐Ÿ›  Lightweight tool + task integration ๐Ÿ”„ Backend-agnostic (SkyRL-train / VeRL / Tinker) ๐Ÿ† Used to train SA-SWE-32B, improving Qwen3-32B from 24.4% โ†’ 39.4%
5
60
274
@erictang000
eric
2 months
๐Ÿง‘โ€๐ŸซOn-Policy Distillation is available as an example on SkyRL! The implementation required no library code changes, and we were able to reproduce AIME math reasoning experiments from the @thinkymachines blogpost. Check out our detailed guide to see how! https://t.co/TqDS649oFQ
@thinkymachines
Thinking Machines
2 months
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
4
5
27
@haoailab
Hao AI Lab
2 months
๐Ÿ”ฅ New Blog: โ€œDisaggregated Inference: 18 Months Laterโ€ 18 months in LLM inference feels like a new Mooreโ€™s Law cycle โ€“ but this time not just 2x per year: ๐Ÿ’ธ Serving cost โ†“10โ€“100x ๐Ÿš€ Throughput โ†‘10x โšก Latency โ†“5x A big reason? Disaggregated Inference. From DistServe, our
Tweet card summary image
hao-ai-lab.github.io
Eighteen months ago, our lab introduced DistServe with a simple bet: split LLM inference into prefill and decode, and scale them independently on separate compute pools. Today, almost every product...
7
48
175
@NovaSkyAI
NovaSky
2 months
โ˜๏ธSkyRL now runs seamlessly with SkyPilot! Let @skypilot_org handle GPU provisioning and cluster setup, so you can focus on RL training with SkyRL. ๐ŸŽฏ Launch distributed RL jobs effortlessly โš™๏ธ Auto-provision GPUs across clouds ๐Ÿค– Train your LLM agents at scale Get started
0
10
25
@shulynnliu
Shu Lynn Liu
2 months
๐Ÿš€ SkyRL has day-zero support for OpenEnv!! This initial integration with OpenEnv highlights how easily new environments plug into SkyRL. Train your own LLM agents across containerized environments with simple, Gym-style APIs ๐Ÿ”ฅ ๐Ÿ‘‰ Check it out:
@_lewtun
Lewis Tunstall
2 months
Excited to share OpenEnv: frontier-grade RL environments for the open-source community ๐Ÿ”ฅ! https://t.co/KVeBMsxohL ๐Ÿงฉ Modular interfaces: a clean Gymnasium-style API (reset(), step(), state()) that plugs into any RL framework ๐Ÿณ Built for scale: run environments in containers
0
10
22
@PyTorch
PyTorch
2 months
โš™๏ธ Engineering for scale and speed. Next-gen RL libraries are redefining how agents learn and interact. ๐ŸŽค Open Agent Summit Speakers: โ€ข @sumanthrh (Anyscale) โ€“ โ€œSkyRL: A Modular RL Library for LLM Agentsโ€ โ€ข @tyler_griggs_ (UC Berkeley) โ€“ โ€œLessons in Agentic RL Modeling and
0
8
57
@tenderizzation
tender
3 months
6
8
289
@ai4research_ucb
AI-Driven Research Systems
3 months
๐Ÿš€ Excited to release our new paper: โ€œBarbarians at the Gate: How AI is Upending Systems Researchโ€ We show how AI-Driven Research for Systems (ADRS) can rediscover or outperform human-designed algorithms across cloud scheduling, MoE expert load balancing, LLM-SQL optimization,
8
35
153
@erictang000
eric
3 months
SkyRL now supports Megatron! Training massive MoE models demands more than just ZeRO-3/FSDP sharding. The Megatron backend for SkyRL unlocks high throughput training with: โœ… 5D parallelism (tensor + pipeline + context + expert + data) โœ… Efficient training for 30B+ MoEs
2
8
21
@pgasawa
Parth Asawa
3 months
Training our advisors was too hard, so we tried to train black-box models like GPT-5 instead. Check out our work: Advisor Models, a training framework that adapts frontier models behind an API to your specific environment, users, or tasks using a smaller, advisor model (1/n)!
16
43
244
@RhysSullivan
Rhys
3 months
when bro suddenly starts submitting PRs with detailed descriptions, correct grammar and casing, and testing steps
164
677
17K
@sumanthrh
Sumanth Hegde
3 months
RT @NovaSkyAI: Scaling agentic simulations is hard, so in collaboration with @anyscalecompute we wrote up our experience using Ray for agenโ€ฆ
Tweet card summary image
anyscale.com
Powered by Ray, Anyscale empowers AI builders to run and scale all ML and AI workloads on any cloud and on-prem.
0
1
0
@pcmoritz
Philipp Moritz
3 months
Do you find it challenging to run RL / agent simulations at a large scale (e.g. dealing with docker and remote execution)? Check out our blog post https://t.co/iNPivIzbc2 where we show how to do it with Ray and mini-swe-agent (kudos to @KLieret)
Tweet card summary image
anyscale.com
Powered by Ray, Anyscale empowers AI builders to run and scale all ML and AI workloads on any cloud and on-prem.
0
7
17
@sumanthrh
Sumanth Hegde
4 months
๐Ÿš€๐Ÿš€๐Ÿš€๐Ÿš€
@tyler_griggs_
Tyler Griggs
4 months
SkyRL x Environments Hub is live!! Train on any of 100+ environments natively on SkyRL today โžก๏ธ https://t.co/Aj287ZKOsw This was super fun to work on, the @PrimeIntellect team crushed it, go OSS!
0
0
4
@shiyi_c98
Shiyi Cao
4 months
An amazing collaboration with the Biomni team! We introduce Biomni-R0, a reasoning-enabled multi-task multi-tool biomedical research agent, trained through end-to-end RL. The results are impressive -- 2x stronger than the base model and >10% better than GPT 5 and Claude 4 Sonnet!
@KexinHuang5
Kexin Huang
4 months
๐Ÿš€ Thrilled to share a preview of Biomni-R0 โ€” we trained the first RL agent end-to-end for biomedical research. โžก๏ธ nearly 2ร— stronger than its open-source base โžก๏ธ >10% better than frontier closed-source models โžก๏ธ Scalable path to hill climb to expert-level performance ๐Ÿ”—
4
16
61
@sumanthrh
Sumanth Hegde
4 months
2. Online quantization: Policy model runs in BF16, while rollouts run in FP8/ Int8. FlashRL patches vLLMโ€™s weight loading APIs to ensure compatibility. Weโ€™ve ported over these patches for v0.9.2 and further optimized weight syncing! Try it out:
0
0
1
@sumanthrh
Sumanth Hegde
4 months
Weโ€™ve implemented both ingredients in FlashRL: 1. Truncated Importance Sampling: Ensures that the difference in rollout and policy logprobs doesnโ€™t hurt performance This is a simple token-level correction factor for your policy loss. Can help stabilize training even without
1
0
2
@sumanthrh
Sumanth Hegde
4 months
SkyRL now has native support for FlashRL! We now support Int8 and FP8 rollouts, enabling blazing fast inference - upto 1.7x for the DAPO recipe - without compromising performance!
@fengyao1909
Feng Yao
4 months
โšก๐…๐๐Ÿ– makes RL faster โ€” but at the cost of performance. We present ๐…๐ฅ๐š๐ฌ๐ก๐‘๐‹, the first ๐จ๐ฉ๐ž๐งโ€“๐ฌ๐จ๐ฎ๐ซ๐œ๐ž & ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐  ๐‘๐‹ ๐ซ๐ž๐œ๐ข๐ฉ๐ž that applies ๐ˆ๐๐“๐Ÿ–/๐…๐๐Ÿ– for rollout ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ฅ๐จ๐ฌ๐ข๐ง๐  ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐š๐ง๐œ๐ž compared to ๐๐…๐Ÿ๐Ÿ”! ๐Ÿ“ Blog:
3
21
118
@eshear
Emmett Shear
4 months
Iโ€™m confused by ppl saying GPT-5 is bad. At least with thinking mode, I find it way more focused, more reasonable, and less rambling. Sometimes I even have to ask it to go into more detail! IMO best model from OAI so far.
102
26
894