NovaSky @NovaSkyAI X Profile

NovaSky

@NovaSkyAI

Followers

2K

Following

88

Media

25

Statuses

72

Next-generation Open Vision and AI @BerkeleySky Contact: [email protected]

Berkeley, California

Joined January 2025

Don't wanna be here? Send us removal request.

NovaSky

@NovaSkyAI

11 days

✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇.Blog: Code:

2

43

201

NovaSky

@NovaSkyAI

11 days

(9/9) SkyRL-v0.1 is from UC Berkeley Sky Computing Lab in collaboration with @anyscalecompute, and a huge team effort: @tyler_griggs_ @sumanthrh @erictang000 @LynnLiu41887950 @shiyi_c98 @DachengLi177 @charlie_ruan @shishirpatil_ @pcmoritz @CyrusHakha @richliaw Akshay Malik.

0

10

NovaSky

@NovaSkyAI

11 days

(8/N) Join us! SkyRL-v0.1 is an early effort and we expect to iterate on the APIs and architecture with help and feedback from the community. Please leave your comments, and don’t hesitate to reach out:. Code: Email: novasky@berkeley.edu.Discord:.

1

10

NovaSky

@NovaSkyAI

11 days

(7/N) A core feature of SkyRL-Gym is reusable tools – define a tool once and use it across multiple environments. This feature stems from SkyRL’s core priority of modularity, making it easy to build new environments by reusing and composing existing tools.

1

0

9

NovaSky

@NovaSkyAI

11 days

(6/N) SkyRL-v0.1 introduces SkyRL-Gym – a lightweight gymnasium of tool-use environments with a simple interface and a library of built-in environments for math, coding, search, and text-to-SQL.

1

11

NovaSky

@NovaSkyAI

11 days

(5/N) SkyRL supports many key backends and features: PPO and GRPO, FSDP2 and DeepSpeed, vLLM and SGLang, asynchronous rollouts, sequence parallelism and packing, synchronous RL or async one-off pipelining, colocated and disaggregated training and generation.

1

9

NovaSky

@NovaSkyAI

11 days

(4/N) As evidence for SkyRL’s extensibility, we provide several examples modifying the stack, like:. Implement a new environment in <50 LoC. Update the sync training loop into async one-off pipelining in <100 new LoC. Disaggregate training and generation on heterogeneous HW with

1

0

9

NovaSky

@NovaSkyAI

11 days

(3/N) SkyRL splits the RL stack into core components with clear public APIs, making it a great fit for users who want to easily plug in custom logic at any layer—custom algorithms, environments, trajectory generation, reward calculation, training execution plans, and more.

1

0

9

NovaSky

@NovaSkyAI

11 days

(2/N) RL is complex, and the community is rapidly exploring training methods at each layer of the stack. Existing RL frameworks tightly couple RL components, making it difficult to flexibly implement custom logic and hampering easy exploration. SkyRL strives to fill this gap.

1

0

9

NovaSky

@NovaSkyAI

2 months

RT @LynnLiu41887950: Excited to share SkyRL-SQL, a simple yet effective multi-turn RL pipeline for training LLMs to generate and refine SQL….

0

14

0

NovaSky

@NovaSkyAI

2 months

9/N SkyRL-SQL is a team effort: @LynnLiu41887950 @sumanthrh @shiyi_c98 @aczhu1326 @DachengLi177 @tyler_griggs_ @erictang000 Akshay Malik @CyrusHakha @richliaw @pcmoritz @matei_zaharia @profjoeyg @istoica05. We also thank the generous compute support from @anyscalecompute,.

0

14

NovaSky

@NovaSkyAI

2 months

8/N SkyRL-SQL shows how simple multi-turn RL + small data can unlock LLM capabilities in data analytics tasks. We'd love the community to try it out and come build with us!. Next up: support more databases, curriculum learning to tackle harder questions, and more. Stay tuned 👀.

1

0

10

NovaSky

@NovaSkyAI

2 months

7/N ⚠️ But multi-turn isn't magic. The model can still get overconfident, repeat failed queries, or skip exploration. In the blog post, we share failure cases and discuss how to improve multi-turn RL for these tasks.

1

0

10

NovaSky

@NovaSkyAI

2 months

6/N 💡 We observe that the model learns to.📖 Break problems into sub-steps.🔍 Verify intermediate results.🛠 Fix syntax + logic errors in SQL.♻️ Iterate on failures. Real examples show the model debugging its way to correct SQL.

1

0

13

NovaSky

@NovaSkyAI

2 months

5/N 💡 Multi-Turn vs. Single-Turn RL.⚡ 2.8× fewer training steps to the same reward.🎯 +16% higher reward after 35 steps. Even in 1-turn eval w/o feedback, it performs better: showing multi-turn improves not just the ability to leverage feedback but also overall reasoning

1

0

11

NovaSky

@NovaSkyAI

2 months

4/N We train SkyRL-SQL-7B on top of Qwen2.5-Coder-7B-Instruct with:.• ✅ 653 samples.• ✅ Simple rewards (format + execution).• 🚫 No partial rewards or million-scale data.Trained for 5 turns and 14 epochs. Result? 📈. 7.2% gain over base. 1.6% over GPT-4o. 1.8% over o4-mini

1

0

11

NovaSky

@NovaSkyAI

2 months

3/N We built a multi-turn RL pipeline on top of VeRL and SearchR1 for Text-to-SQL — letting LLMs think, query the database, observe results, refine, and output final solutions.

1

0

12

NovaSky

@NovaSkyAI

2 months

2/N Generating SQL in one shot often fails when questions or schemas get complex. Inspired by exploratory data analysis, we train LLM agents to iteratively refine SQL through trial and error — not rely on one-shot guesses.

1

0

12

NovaSky

@NovaSkyAI

2 months

1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. 🚀 Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model

5

32

146

NovaSky

@NovaSkyAI

2 months

RT @eric_haibin_lin: SkyRL is a great work extending @verl_project with environments for agent tasks. It leverages the sglang multi-turn/to….

0

28

0