NovaSkyAI Profile Banner
NovaSky Profile
NovaSky

@NovaSkyAI

Followers
2K
Following
88
Media
25
Statuses
72

Next-generation Open Vision and AI @BerkeleySky Contact: [email protected]

Berkeley, California
Joined January 2025
Don't wanna be here? Send us removal request.
@NovaSkyAI
NovaSky
11 days
✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇.Blog: Code:
Tweet media one
2
43
201
@NovaSkyAI
NovaSky
11 days
(9/9) SkyRL-v0.1 is from UC Berkeley Sky Computing Lab in collaboration with @anyscalecompute, and a huge team effort: @tyler_griggs_ @sumanthrh @erictang000 @LynnLiu41887950 @shiyi_c98 @DachengLi177 @charlie_ruan @shishirpatil_ @pcmoritz @CyrusHakha @richliaw Akshay Malik.
0
0
10
@NovaSkyAI
NovaSky
11 days
(8/N) Join us! SkyRL-v0.1 is an early effort and we expect to iterate on the APIs and architecture with help and feedback from the community. Please leave your comments, and don’t hesitate to reach out:. Code: Email: novasky@berkeley.edu.Discord:.
1
1
10
@NovaSkyAI
NovaSky
11 days
(7/N) A core feature of SkyRL-Gym is reusable tools – define a tool once and use it across multiple environments. This feature stems from SkyRL’s core priority of modularity, making it easy to build new environments by reusing and composing existing tools.
1
0
9
@NovaSkyAI
NovaSky
11 days
(6/N) SkyRL-v0.1 introduces SkyRL-Gym – a lightweight gymnasium of tool-use environments with a simple interface and a library of built-in environments for math, coding, search, and text-to-SQL.
Tweet media one
1
1
11
@NovaSkyAI
NovaSky
11 days
(5/N) SkyRL supports many key backends and features: PPO and GRPO, FSDP2 and DeepSpeed, vLLM and SGLang, asynchronous rollouts, sequence parallelism and packing, synchronous RL or async one-off pipelining, colocated and disaggregated training and generation.
1
1
9
@NovaSkyAI
NovaSky
11 days
(4/N) As evidence for SkyRL’s extensibility, we provide several examples modifying the stack, like:. Implement a new environment in <50 LoC. Update the sync training loop into async one-off pipelining in <100 new LoC. Disaggregate training and generation on heterogeneous HW with
Tweet media one
1
0
9
@NovaSkyAI
NovaSky
11 days
(3/N) SkyRL splits the RL stack into core components with clear public APIs, making it a great fit for users who want to easily plug in custom logic at any layer—custom algorithms, environments, trajectory generation, reward calculation, training execution plans, and more.
1
0
9
@NovaSkyAI
NovaSky
11 days
(2/N) RL is complex, and the community is rapidly exploring training methods at each layer of the stack. Existing RL frameworks tightly couple RL components, making it difficult to flexibly implement custom logic and hampering easy exploration. SkyRL strives to fill this gap.
1
0
9
@NovaSkyAI
NovaSky
2 months
RT @LynnLiu41887950: Excited to share SkyRL-SQL, a simple yet effective multi-turn RL pipeline for training LLMs to generate and refine SQL….
0
14
0
@NovaSkyAI
NovaSky
2 months
9/N SkyRL-SQL is a team effort: @LynnLiu41887950 @sumanthrh @shiyi_c98 @aczhu1326 @DachengLi177 @tyler_griggs_ @erictang000 Akshay Malik @CyrusHakha @richliaw @pcmoritz @matei_zaharia @profjoeyg @istoica05. We also thank the generous compute support from @anyscalecompute,.
0
0
14
@NovaSkyAI
NovaSky
2 months
8/N SkyRL-SQL shows how simple multi-turn RL + small data can unlock LLM capabilities in data analytics tasks. We'd love the community to try it out and come build with us!. Next up: support more databases, curriculum learning to tackle harder questions, and more. Stay tuned 👀.
1
0
10
@NovaSkyAI
NovaSky
2 months
7/N ⚠️ But multi-turn isn't magic. The model can still get overconfident, repeat failed queries, or skip exploration. In the blog post, we share failure cases and discuss how to improve multi-turn RL for these tasks.
1
0
10
@NovaSkyAI
NovaSky
2 months
6/N 💡 We observe that the model learns to.📖 Break problems into sub-steps.🔍 Verify intermediate results.🛠 Fix syntax + logic errors in SQL.♻️ Iterate on failures. Real examples show the model debugging its way to correct SQL.
Tweet media one
1
0
13
@NovaSkyAI
NovaSky
2 months
5/N 💡 Multi-Turn vs. Single-Turn RL.⚡ 2.8× fewer training steps to the same reward.🎯 +16% higher reward after 35 steps. Even in 1-turn eval w/o feedback, it performs better: showing multi-turn improves not just the ability to leverage feedback but also overall reasoning
Tweet media one
1
0
11
@NovaSkyAI
NovaSky
2 months
4/N We train SkyRL-SQL-7B on top of Qwen2.5-Coder-7B-Instruct with:.• ✅ 653 samples.• ✅ Simple rewards (format + execution).• 🚫 No partial rewards or million-scale data.Trained for 5 turns and 14 epochs. Result? 📈. 7.2% gain over base. 1.6% over GPT-4o. 1.8% over o4-mini
Tweet media one
1
0
11
@NovaSkyAI
NovaSky
2 months
3/N We built a multi-turn RL pipeline on top of VeRL and SearchR1 for Text-to-SQL — letting LLMs think, query the database, observe results, refine, and output final solutions.
Tweet media one
1
0
12
@NovaSkyAI
NovaSky
2 months
2/N Generating SQL in one shot often fails when questions or schemas get complex. Inspired by exploratory data analysis, we train LLM agents to iteratively refine SQL through trial and error — not rely on one-shot guesses.
Tweet media one
1
0
12
@NovaSkyAI
NovaSky
2 months
1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. 🚀 Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model
Tweet media one
5
32
146
@NovaSkyAI
NovaSky
2 months
RT @eric_haibin_lin: SkyRL is a great work extending @verl_project with environments for agent tasks. It leverages the sglang multi-turn/to….
0
28
0