
Mark Müller
@mnmueller
Followers
65
Following
124
Media
0
Statuses
25
PhD student at @the_sri_lab at @ETHZ
Zurich
Joined January 2021
We built Agents in the Wild to track this revolution in real-time:.👉 Live, open-source dashboard.👉 Tracks agent behavior across public GitHub PRs.👉 Updated daily.Code: Built with our MSc student @Christian Mürtz (@SRILab) 🙌.
github.com
We track and analyze the activity and performance of autonomous code agents in the wild - logic-star-ai/insights
0
0
3
🚨 AI agents wrote 7% of all GitHub PRs in June. But can we trust their code?. We built Agents in the Wild – a live dashboard tracking autonomous AI agents across GitHub to answer that question: Here’s what we learned from analyzing 10M+ PRs 👇 1/n 🧵.
insights.logicstar.ai
We track and analyze the performance of autonomous code agents in the wild (on GitHub).
2
5
9
RT @logic_star_ai: We are excited to see the community use our SWT-Bench and work on the crucial topic of test generation!.
0
1
0
RT @nielstron: SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE….
0
2
0
RT @logic_star_ai: We have our first submission for SWT-Bench 🚀.AEGIS, a dedicated test generation agent, achieves 47.8% accuracy 🏆 , signi….
0
4
0
RT @logic_star_ai: 🚀 Introducing the SWT-Bench Leaderboard!.Test your AI's ability to write tests reproducing real-world GitHub issues and….
swtbench.com
Check out the SWT-Bench leaderboard! SWT-Bench is a benchmark designed to assess the capabilities of large language models and Code Agents in generating unit tests on real-world code repositories,...
0
3
0
Meet me at this morning's NeurIPS poster session to discuss our work on generating reproducing test cases with Code Agents.
SRI Lab at #NeurIPS2024 - 1/8. SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents.Niels Mündler (@nielstron), Mark Niklas Mueller, Jingxuan He (@jingxuan_he), Martin Vechev (@mvechev).⏰ /📍 Wed 11th, 11AM - 2PM, West Ballroom A-D #5406.📝 We explore software.
0
0
0
RT @logic_star_ai: Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!.
0
1
0
RT @nielstron: Presenting today @icmlconf 2024 Workshop FM in the Wild 🤖 🏞️. "Code Agents are State of The Art Software Testers". SWE-Agen….
0
4
0
RT @marc_r_fischer: On Tuesday at 11:30, in Poster Session 1, we will present Prompt Sketching, a novel decoder-driven approach for templat….
0
1
0
Excited to share our latest work which we will present today at @iclr_conf.
We show that neural network certification with all commonly used convex relaxations is imprecise for any NN expressing interesting (>1-d inputs) functions and discuss implications for cert. training. 🧑🔬 Maximilian Baader, @mnmueller, @MaoYuhao91443.📄
0
0
3
RT @mvechev: A couple of amazing PhD students graduated from our lab (@the_sri_lab) at ETH Zurich today: @mbalunovic and @mnmueller. Both d….
0
4
0
RT @the_sri_lab: Find us @NeurIPSConf #NeurIPS2023 to chat about our latest work. We are excited to share works on certified robustness, a….
0
3
0
RT @the_sri_lab: @mnmueller and @marc_r_fischer introduced a new from of Abstract Interpretation for challenging unbounded loops enabling t….
0
1
0
Super excited to talk about robustness guarantees for neural networks at @mlsec_lab's seminar!.
We are excited to present a new event in our seminar series on ML Security!. We will host Mark Müller (ETH Zurich) on June 6, 2023, at 15:00 CEST. Free registration: @adversarial_ML @trustworthy_ml @aivillage_dc @RedTeamVillage_
0
0
5
RT @the_sri_lab: At @iclr_conf members of SRI lab presented 3 works:.- ⚖️ Human-Guided Fair Classification for NLP.- 📈 Robustness Verificat….
0
2
0