
kourosh hakhamaneshi
@CyrusHakha
Followers
971
Following
2K
Media
42
Statuses
747
LLMs + Ray @anyscalecompute 💻 prev PhD, EECS, @UCBerkeley 👨🎓
California, USA
Joined September 2010
RT @pcmoritz: There have been a lot of open source RL libraries for training LLMs popping up recently. We took a stab at describing some of….
0
10
0
RT @erictang000: Check out our updates to SkyRL!. In this release we also reproduced our prior results outperforming GPT-4o for multi-turn….
0
4
0
RL for LLMs is here to stay. With SkyRL, you get both modularity and performance:. • Clean trainer/generator separation (colocated or disaggregated).• Sync + async RL (multi-turn).• Remote inference (OpenAI-compatible). No forking needed to customize. Easy to use and.
✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇.Blog: Code:
0
0
10
RT @PyTorch: An #OpenSource Stack for #AI Compute: @kubernetesio + @raydistributed + @pytorch + @vllm_project ➡️ This Anyscale blog post by….
0
28
0
RT @robertnishihara: Impressive work! Agentic workflows have tons and tons of design and arcitectural decisions that affect performance and….
0
4
0
I get a lot of questions around what is the role of each of these layers of AI compute stack: vLLM, ray, k8s, etc. What does ray do in vLLM, what does ray do around vLLM? Why is Ray core part of post-training frameworks like vERL, etc? In this blog @robertnishihara depicts what a.
The AI compute software stack consists of 3 specialized layers:. 🔧🔧🔧 Layer 1: Training & Inference Framework (PyTorch + vLLM).• Runs models efficiently on GPUs.• Handles model optimization and model parallelism strategies.• Manages accelerator memory and automatic
0
2
8
RT @sumanthrh: Some of our interesting observations from working on multi-turn text2SQL: .- Data-efficient RL works pretty well: We did ver….
0
4
0
SkyRL-SQL is another illustration of applying RL to an agentic workflow that beats the state of the art frontier reasoning and non-reasoninig models with a sample efficient training recipe. The improvements are still incremental but the direction is very promising. We also.
1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. 🚀 Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model
1
4
14
RT @NovaSkyAI: 1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, ref….
0
32
0
RT @anyscalecompute: “We realized our ML Engineers were spending too much time waiting before they could iterate.”.– Wenyue Liu, ML Platfor….
0
7
0
RT @haoailab: Announcing FastVideo V1, a unified framework for accelerating video generation. FastVideo V1 offers:.- A simple, consistent….
0
43
0
We are seeing incremental progress in oss on improving true long term decision making process for AI agents. SkyRL-v0 is a snapshot of the progress we have made in collaboration with @NovaSkyAI . We will have more releases in the upcoming weeks.
1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50
1
0
8
RT @richliaw: Today we’re introducing SkyRL, a RL training pipeline optimized for long-horizon tasks like SWE-Bench, built on top of VeRL….
0
97
0
RT @NovaSkyAI: 1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks l….
0
70
0
OpenRLHF’s post-training stack: . Ray + vLLM + zero3 (deepspeed). We have made sure vLLM has native support for Ray allowing granular placement of vLLM workers on the desired physical placement. This is crucial for post-training frameworks like verl or OpenRLHF.
OpenRLHF is a pioneering framework to use vLLM for RLHF, driving many design and implementation of vLLM's features for RLHF, making vLLM a popular choice for many RLHF frameworks. Learn more about the story at
0
1
17
RT @robertnishihara: If you're curious why vLLM, which is an inference engine, is being used in the post-training tech stack, the answer is….
0
25
0
RT @vllm_project: OpenRLHF is a pioneering framework to use vLLM for RLHF, driving many design and implementation of vLLM's features for RL….
0
40
0