Agentica_ Profile Banner
Agentica Project Profile
Agentica Project

@Agentica_

Followers
3K
Following
125
Media
24
Statuses
72

Building generalist agents that scale @BerkeleySky

San Francisco, CA
Joined January 2025
Don't wanna be here? Send us removal request.
@Agentica_
Agentica Project
15 days
🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE
Tweet media one
15
65
347
@Agentica_
Agentica Project
14 days
RT @AlpayAriyak: Excited to introduce DeepSWE-Preview, our latest model trained in collaboration with @Agentica_ . Using only RL, we increa….
0
4
0
@Agentica_
Agentica Project
14 days
RT @WolframRvnwlf: Let's give a big round of applause for an amazing open-source release! They're not just sharing the model's weights; the….
0
2
0
@Agentica_
Agentica Project
14 days
It's easy to confuse Best@K vs Pass@K—and we've seen some misconceptions about our results. Our 59% on SWEBench-Verified is Pass@1 with Best@16, not Pass@8/16. Our Pass@8/16 is 67%/71%. So how did we achieve this? . DeepSWE generates N candidate solutions. Then, another LLM
Tweet media one
@casper_hansen_
Casper Hansen
14 days
Is it malpractice to report SOTA with pass@8 without using other models at pass@8 or just standard practice at this point? It's clearly not SOTA if it's behind Devstral in a pass@1.
1
15
52
@Agentica_
Agentica Project
14 days
RT @ChenguangWang: 🚀 Introducing rLLM: a flexible framework for post-training language agents via RL. It's also the engine behind DeepSWE,….
0
4
0
@Agentica_
Agentica Project
15 days
RT @sijun_tan: The first half of 2025 is all about reasoning models. The second half? It’s about agents. At Agentica, we’re thrilled to la….
0
9
0
@Agentica_
Agentica Project
15 days
RT @koushik77: We believe in experience-driven learning in the SKY lab. Hybrid verification plays an important role.
0
4
0
@Agentica_
Agentica Project
15 days
RT @StringChaos: 🚀 Introducing DeepSWE: Open-Source SWE Agent . We're excited to release DeepSWE, our fully open-source software engineerin….
0
12
0
@Agentica_
Agentica Project
15 days
RT @michaelzluo: 🚀The era of overpriced, black-box coding assistants is OVER. Thrilled to lead the @Agentica_ team in open-sourcing and tr….
0
13
0
@Agentica_
Agentica Project
15 days
RT @togethercompute: Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-3….
0
78
0
@Agentica_
Agentica Project
15 days
@Alibaba_Qwen (8/n) Acknowledgements. DeepSWE is the result of an incredible collaboration between @agentica and @togethercompute. Agentica’s core members hail from @SkyLab and @BAIR, bringing together cutting-edge research and real-world deployment. More details on rLLM is coming soon in our.
1
1
15
@Agentica_
Agentica Project
15 days
(7/n) Acknowledgements. DeepSWE stands on the shoulders of giants — it's trained from Qwen3-32B. Huge kudos to the @Alibaba_Qwen team for open-sourcing such a powerful model!.
1
0
14
@Agentica_
Agentica Project
15 days
(6/n) Test-Time Scaling. We studied two approaches to boost agent performance at test time:. 📈 Scaling Context Length: We expanded max context from 16K → 128K tokens. Performance improved, with ~2% gains beyond 32K, reaching 42.2% Pass@1. 🎯 Scaling Agent Rollouts: We
Tweet media one
1
1
10
@Agentica_
Agentica Project
15 days
(5/n) Unlike reasoning models, RL for SWE agents creates unique challenges for scaling training environments. SWE-Agent Environments: Training with RL across parallel experiments spawns thousands of containers per RL iteration. We hit major bottlenecks with local Docker
Tweet media one
1
0
14
@Agentica_
Agentica Project
15 days
(4/n) DeepSWE-Preview is a reasoning-enabled agentic model trained on the R2E-Gym training environments. Given a GitHub issue, the agent is given access to a repository as an RL environment and can use tools to explore the codebase, browse/view files, and use a bash terminal to
Tweet media one
1
1
14
@Agentica_
Agentica Project
15 days
(3/n) DeepSWE is trained directly with pure reinforcement learning—without distillation. In just 200 steps of training, its Pass@1 SWE-Bench-Verified score rises from 23→42.2%. Remarkably, RL improves DeepSWE’s generalization over time.
Tweet media one
1
1
14
@Agentica_
Agentica Project
15 days
Links:. 🤗  Model:  📄 DeepSWE blog: 📄 ​​rLLM blog:.💻 Github (rLLM):
1
3
25
@Agentica_
Agentica Project
3 months
We're trending on @huggingface models today! 🔥. Huge thanks to our amazing community for your support. 🙏
Tweet media one
2
6
47
@Agentica_
Agentica Project
3 months
RT @Yuchenj_UW: UC Berkeley open-sourced a 14B model that rivals OpenAI o3-mini and o1 on coding!. They applied RL to Deepseek-R1-Distilled….
0
398
0
@Agentica_
Agentica Project
3 months
RT @ralucaadapopa: Our team has open sourced our reasoning model that reaches o1 and o3-mini level on coding and math: DeepCoder-14B-Previ….
0
2
0