Agentica Project @Agentica_ X Profile

Agentica Project

@Agentica_

Followers

3K

Following

125

Media

24

Statuses

72

Building generalist agents that scale @BerkeleySky

San Francisco, CA

Joined January 2025

Don't wanna be here? Send us removal request.

Agentica Project

@Agentica_

15 days

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE

15

65

347

Agentica Project

@Agentica_

14 days

RT @AlpayAriyak: Excited to introduce DeepSWE-Preview, our latest model trained in collaboration with @Agentica_ . Using only RL, we increa….

0

4

0

Agentica Project

@Agentica_

14 days

RT @WolframRvnwlf: Let's give a big round of applause for an amazing open-source release! They're not just sharing the model's weights; the….

0

2

0

Agentica Project

@Agentica_

14 days

It's easy to confuse Best@K vs Pass@K—and we've seen some misconceptions about our results. Our 59% on SWEBench-Verified is Pass@1 with Best@16, not Pass@8/16. Our Pass@8/16 is 67%/71%. So how did we achieve this? . DeepSWE generates N candidate solutions. Then, another LLM

Casper Hansen

@casper_hansen_

14 days

Is it malpractice to report SOTA with pass@8 without using other models at pass@8 or just standard practice at this point? It's clearly not SOTA if it's behind Devstral in a pass@1.

1

15

52

Agentica Project

@Agentica_

14 days

RT @ChenguangWang: 🚀 Introducing rLLM: a flexible framework for post-training language agents via RL. It's also the engine behind DeepSWE,….

0

4

0

Agentica Project

@Agentica_

15 days

RT @sijun_tan: The first half of 2025 is all about reasoning models. The second half? It’s about agents. At Agentica, we’re thrilled to la….

0

9

0

Agentica Project

@Agentica_

15 days

RT @koushik77: We believe in experience-driven learning in the SKY lab. Hybrid verification plays an important role.

0

4

0

Agentica Project

@Agentica_

15 days

RT @StringChaos: 🚀 Introducing DeepSWE: Open-Source SWE Agent . We're excited to release DeepSWE, our fully open-source software engineerin….

0

12

0

Agentica Project

@Agentica_

15 days

RT @michaelzluo: 🚀The era of overpriced, black-box coding assistants is OVER. Thrilled to lead the @Agentica_ team in open-sourcing and tr….

0

13

0

Agentica Project

@Agentica_

15 days

RT @togethercompute: Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-3….

0

78

0

Agentica Project

@Agentica_

15 days

@Alibaba_Qwen (8/n) Acknowledgements. DeepSWE is the result of an incredible collaboration between @agentica and @togethercompute. Agentica’s core members hail from @SkyLab and @BAIR, bringing together cutting-edge research and real-world deployment. More details on rLLM is coming soon in our.

1

15

Agentica Project

@Agentica_

15 days

(7/n) Acknowledgements. DeepSWE stands on the shoulders of giants — it's trained from Qwen3-32B. Huge kudos to the @Alibaba_Qwen team for open-sourcing such a powerful model!.

1

0

14

Agentica Project

@Agentica_

15 days

(6/n) Test-Time Scaling. We studied two approaches to boost agent performance at test time:. 📈 Scaling Context Length: We expanded max context from 16K → 128K tokens. Performance improved, with ~2% gains beyond 32K, reaching 42.2% Pass@1. 🎯 Scaling Agent Rollouts: We

1

10

Agentica Project

@Agentica_

15 days

(5/n) Unlike reasoning models, RL for SWE agents creates unique challenges for scaling training environments. SWE-Agent Environments: Training with RL across parallel experiments spawns thousands of containers per RL iteration. We hit major bottlenecks with local Docker

1

0

14

Agentica Project

@Agentica_

15 days

(4/n) DeepSWE-Preview is a reasoning-enabled agentic model trained on the R2E-Gym training environments. Given a GitHub issue, the agent is given access to a repository as an RL environment and can use tools to explore the codebase, browse/view files, and use a bash terminal to

1

14

Agentica Project

@Agentica_

15 days

(3/n) DeepSWE is trained directly with pure reinforcement learning—without distillation. In just 200 steps of training, its Pass@1 SWE-Bench-Verified score rises from 23→42.2%. Remarkably, RL improves DeepSWE’s generalization over time.

1

14

Agentica Project

@Agentica_

15 days

Links:. 🤗 Model: 📄 DeepSWE blog: 📄 rLLM blog:.💻 Github (rLLM):

1

3

25

Agentica Project

@Agentica_

3 months

We're trending on @huggingface models today! 🔥. Huge thanks to our amazing community for your support. 🙏

2

6

47

Agentica Project

@Agentica_

3 months

RT @Yuchenj_UW: UC Berkeley open-sourced a 14B model that rivals OpenAI o3-mini and o1 on coding!. They applied RL to Deepseek-R1-Distilled….

0

398

0

Agentica Project

@Agentica_

3 months

RT @ralucaadapopa: Our team has open sourced our reasoning model that reaches o1 and o3-mini level on coding and math: DeepCoder-14B-Previ….

0

2

0