LogicStar AI @logic_star_ai X Profile

LogicStar AI

@logic_star_ai

Followers

19

Following

17

Media

1

Statuses

14

Building agentic Application Maintainance

Zurich, Switzerland

Joined July 2024

Don't wanna be here? Send us removal request.

LogicStar AI

@logic_star_ai

3 months

We are excited to see the community use our SWT-Bench and work on the crucial topic of test generation!.

Niels Mündler (@ ICML)

@nielstron

3 months

🚨 New SWT-Bench Submission! 🤖 . Amazon Q Developer Agent leads the SWT-Bench leaderboard 🥇 with an impressive 49% of successfully tested issues and a coverage improvement of 57% on SWT-Bench Verified.

0

1

4

LogicStar AI

@logic_star_ai

5 months

RT @nielstron: SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE….

0

2

0

LogicStar AI

@logic_star_ai

5 months

To learn more about AEGIS, check their paper (Note that the paper reports 36.0% success rate using GPT-4o, whereas the 47.8% reported here are with Claude Sonnet 3.5): 3/3 🧵.

0

4

LogicStar AI

@logic_star_ai

5 months

AEGIS creates a separate reproduction script (not integrated into the test framework), leverages execution feedback more effectively, and invests more effort into gathering relevant context before beginning test generation. See all results at 2/3 🧵.

1

0

4

LogicStar AI

@logic_star_ai

5 months

We have our first submission for SWT-Bench 🚀.AEGIS, a dedicated test generation agent, achieves 47.8% accuracy 🏆 , significantly outperforming our SWE-Agent+ baseline and demonstrating the potential of dedicated test generation agents. 1/3 🧵

2

4

7

LogicStar AI

@logic_star_ai

7 months

If you want to learn more about our vision and how we plan to automate software maintenance check out 🤖.

0

1

LogicStar AI

@logic_star_ai

7 months

These reproducing tests can then be used to help both human developers and other Code Agents develop and test bug fixes. In our early experiments, we could boost the precision of simple code Agents by 2x using this technique. 🚀.

1

0

1

LogicStar AI

@logic_star_ai

7 months

The task in SWT-Bench is to, given a code base and a GitHub issue description, generate a test reproducing the described issue. We can check this by making sure the test fails before a ground truth fix was applied to the repo but passes after. We additionally measure Coverage. 📈.

1

0

1

LogicStar AI

@logic_star_ai

7 months

SWT-Bench was just presented at NeurIPS'24 in Vancouver in collaboration with SRI Lab (@the_sri_lab) from ETH Zurich. A special thanks to our great collaborators Niels Mündler (@nielstron), Mark Müller (@mnmueller), Jingxuan He (@jingxuan_he), and Martin Vechev (@mvechev). 🎉.

1

0

3

LogicStar AI

@logic_star_ai

7 months

🚀 Introducing the SWT-Bench Leaderboard!.Test your AI's ability to write tests reproducing real-world GitHub issues and improve coverage where it matters. 🤖.Ready for the challenge? 👉 #AI #SoftwareTesting #SWTBench #CodeAgents.

1

3

6

LogicStar AI

@logic_star_ai

7 months

RT @the_sri_lab: SRI Lab at #NeurIPS2024 - 1/8. SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents.Niels Mündler (@nie….

0

2

0

LogicStar AI

@logic_star_ai

7 months

RT @the_sri_lab: SRI Lab is proud to present 8 of our works on Privacy and AI Safety at #NeurIPS2024 this year (7 main conference, 1 worksh….

0

5

0

LogicStar AI

@logic_star_ai

8 months

Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!.

Ofir Press

@OfirPress

8 months

Super cool work by @nielstron et al: SWT-Bench is SWE-bench for test generation! .They give the model a repo and an issue and it has to write a test for the issue. They show that SWE-agent is able to write good tests for 19% of the issues in the benchmark!. 🧵(1/3)

0

1

2