LogicStar AI Profile
LogicStar AI

@logic_star_ai

Followers
19
Following
17
Media
1
Statuses
14

Building agentic Application Maintainance

Zurich, Switzerland
Joined July 2024
Don't wanna be here? Send us removal request.
@logic_star_ai
LogicStar AI
3 months
We are excited to see the community use our SWT-Bench and work on the crucial topic of test generation!.
@nielstron
Niels Mündler (@ ICML)
3 months
🚨 New SWT-Bench Submission! 🤖 . Amazon Q Developer Agent leads the SWT-Bench leaderboard 🥇 with an impressive 49% of successfully tested issues and a coverage improvement of 57% on SWT-Bench Verified.
Tweet media one
0
1
4
@logic_star_ai
LogicStar AI
5 months
RT @nielstron: SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE….
0
2
0
@logic_star_ai
LogicStar AI
5 months
To learn more about AEGIS, check their paper (Note that the paper reports 36.0% success rate using GPT-4o, whereas the 47.8% reported here are with Claude Sonnet 3.5): 3/3 🧵.
0
0
4
@logic_star_ai
LogicStar AI
5 months
AEGIS creates a separate reproduction script (not integrated into the test framework), leverages execution feedback more effectively, and invests more effort into gathering relevant context before beginning test generation. See all results at 2/3 🧵.
1
0
4
@logic_star_ai
LogicStar AI
5 months
We have our first submission for SWT-Bench 🚀.AEGIS, a dedicated test generation agent, achieves 47.8% accuracy 🏆 , significantly outperforming our SWE-Agent+ baseline and demonstrating the potential of dedicated test generation agents. 1/3 🧵
Tweet media one
2
4
7
@logic_star_ai
LogicStar AI
7 months
If you want to learn more about our vision and how we plan to automate software maintenance check out 🤖.
0
0
1
@logic_star_ai
LogicStar AI
7 months
These reproducing tests can then be used to help both human developers and other Code Agents develop and test bug fixes. In our early experiments, we could boost the precision of simple code Agents by 2x using this technique. 🚀.
1
0
1
@logic_star_ai
LogicStar AI
7 months
The task in SWT-Bench is to, given a code base and a GitHub issue description, generate a test reproducing the described issue. We can check this by making sure the test fails before a ground truth fix was applied to the repo but passes after. We additionally measure Coverage. 📈.
1
0
1
@logic_star_ai
LogicStar AI
7 months
SWT-Bench was just presented at NeurIPS'24 in Vancouver in collaboration with SRI Lab (@the_sri_lab) from ETH Zurich. A special thanks to our great collaborators Niels Mündler (@nielstron), Mark Müller (@mnmueller), Jingxuan He (@jingxuan_he), and Martin Vechev (@mvechev). 🎉.
1
0
3
@logic_star_ai
LogicStar AI
7 months
🚀 Introducing the SWT-Bench Leaderboard!.Test your AI's ability to write tests reproducing real-world GitHub issues and improve coverage where it matters. 🤖.Ready for the challenge? 👉 #AI #SoftwareTesting #SWTBench #CodeAgents.
1
3
6
@logic_star_ai
LogicStar AI
7 months
RT @the_sri_lab: SRI Lab at #NeurIPS2024 - 1/8. SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents.Niels Mündler (@nie….
0
2
0
@logic_star_ai
LogicStar AI
7 months
RT @the_sri_lab: SRI Lab is proud to present 8 of our works on Privacy and AI Safety at #NeurIPS2024 this year (7 main conference, 1 worksh….
0
5
0
@logic_star_ai
LogicStar AI
8 months
Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!.
@OfirPress
Ofir Press
8 months
Super cool work by @nielstron et al: SWT-Bench is SWE-bench for test generation! .They give the model a repo and an issue and it has to write a test for the issue. They show that SWE-agent is able to write good tests for 19% of the issues in the benchmark!. 🧵(1/3)
Tweet media one
0
1
2