
LogicStar AI
@logic_star_ai
Followers
19
Following
17
Media
1
Statuses
14
Building agentic Application Maintainance
Zurich, Switzerland
Joined July 2024
We are excited to see the community use our SWT-Bench and work on the crucial topic of test generation!.
🚨 New SWT-Bench Submission! 🤖 . Amazon Q Developer Agent leads the SWT-Bench leaderboard 🥇 with an impressive 49% of successfully tested issues and a coverage improvement of 57% on SWT-Bench Verified.
0
1
4
RT @nielstron: SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE….
0
2
0
SWT-Bench was just presented at NeurIPS'24 in Vancouver in collaboration with SRI Lab (@the_sri_lab) from ETH Zurich. A special thanks to our great collaborators Niels Mündler (@nielstron), Mark Müller (@mnmueller), Jingxuan He (@jingxuan_he), and Martin Vechev (@mvechev). 🎉.
1
0
3
🚀 Introducing the SWT-Bench Leaderboard!.Test your AI's ability to write tests reproducing real-world GitHub issues and improve coverage where it matters. 🤖.Ready for the challenge? 👉 #AI #SoftwareTesting #SWTBench #CodeAgents.
1
3
6
RT @the_sri_lab: SRI Lab at #NeurIPS2024 - 1/8. SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents.Niels Mündler (@nie….
0
2
0
RT @the_sri_lab: SRI Lab is proud to present 8 of our works on Privacy and AI Safety at #NeurIPS2024 this year (7 main conference, 1 worksh….
0
5
0
Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!.
Super cool work by @nielstron et al: SWT-Bench is SWE-bench for test generation! .They give the model a repo and an issue and it has to write a test for the issue. They show that SWE-agent is able to write good tests for 19% of the issues in the benchmark!. 🧵(1/3)
0
1
2