Mark Müller Profile
Mark Müller

@mnmueller

Followers
65
Following
124
Media
0
Statuses
25

PhD student at @the_sri_lab at @ETHZ

Zurich
Joined January 2021
Don't wanna be here? Send us removal request.
@mnmueller
Mark Müller
1 month
We built Agents in the Wild to track this revolution in real-time:.👉 Live, open-source dashboard.👉 Tracks agent behavior across public GitHub PRs.👉 Updated daily.Code: Built with our MSc student @Christian Mürtz (@SRILab) 🙌.
Tweet card summary image
github.com
We track and analyze the activity and performance of autonomous code agents in the wild - logic-star-ai/insights
0
0
3
@mnmueller
Mark Müller
1 month
🏢 The enterprise trust gap. Agent PRs merge 95% of the time on small repos. But on large/popular ones? Drops to ~25%, even for Google’s Jules. Without strong validation, agents will struggle in business-critical environments. 5/n.
1
0
2
@mnmueller
Mark Müller
1 month
🔍 Manual guardrails are still needed. Agents like Codex & Jules require user review before PR submission. That boosts merge rates by up to 50% vs Copilot. But it also limits autonomy. Review = trust, but trust = bottleneck <- this is what we are working on at LogicStar 4/n 🧵.
1
0
1
@mnmueller
Mark Müller
1 month
🧠 How agents code. Agent-generated PRs are:.– Dominantly in Python / JS / TS.– More likely to add new code than refactor existing code. They’re focused on building features, not fixing bugs or doing maintenance. 3/n 🧵.
1
0
1
@grok
Grok
9 hours
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
45
24
261
@mnmueller
Mark Müller
1 month
🧪 Still early days. Agents generate 5–10% of PRs overall but only 1–2% on popular repos. Most of their activity is in small, low-starred projects. Translation: we're still in the experimentation phase. 2/n 🧵.
1
0
1
@mnmueller
Mark Müller
1 month
🚨 AI agents wrote 7% of all GitHub PRs in June. But can we trust their code?. We built Agents in the Wild – a live dashboard tracking autonomous AI agents across GitHub to answer that question: Here’s what we learned from analyzing 10M+ PRs 👇 1/n 🧵.
Tweet card summary image
insights.logicstar.ai
We track and analyze the performance of autonomous code agents in the wild (on GitHub).
2
5
9
@mnmueller
Mark Müller
4 months
RT @logic_star_ai: We are excited to see the community use our SWT-Bench and work on the crucial topic of test generation!.
0
1
0
@mnmueller
Mark Müller
6 months
RT @nielstron: SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE….
0
2
0
@mnmueller
Mark Müller
6 months
RT @logic_star_ai: We have our first submission for SWT-Bench 🚀.AEGIS, a dedicated test generation agent, achieves 47.8% accuracy 🏆 , signi….
0
4
0
@mnmueller
Mark Müller
8 months
RT @logic_star_ai: 🚀 Introducing the SWT-Bench Leaderboard!.Test your AI's ability to write tests reproducing real-world GitHub issues and….
Tweet card summary image
swtbench.com
Check out the SWT-Bench leaderboard! SWT-Bench is a benchmark designed to assess the capabilities of large language models and Code Agents in generating unit tests on real-world code repositories,...
0
3
0
@mnmueller
Mark Müller
8 months
Meet me at this morning's NeurIPS poster session to discuss our work on generating reproducing test cases with Code Agents.
@the_sri_lab
SRI Lab
8 months
SRI Lab at #NeurIPS2024 - 1/8. SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents.Niels Mündler (@nielstron), Mark Niklas Mueller, Jingxuan He (@jingxuan_he), Martin Vechev (@mvechev).⏰ /📍 Wed 11th, 11AM - 2PM, West Ballroom A-D #5406.📝 We explore software.
0
0
0
@mnmueller
Mark Müller
9 months
RT @logic_star_ai: Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!.
0
1
0
@mnmueller
Mark Müller
1 year
RT @nielstron: Presenting today @icmlconf 2024 Workshop FM in the Wild 🤖 🏞️. "Code Agents are State of The Art Software Testers". SWE-Agen….
0
4
0
@mnmueller
Mark Müller
1 year
RT @marc_r_fischer: On Tuesday at 11:30, in Poster Session 1, we will present Prompt Sketching, a novel decoder-driven approach for templat….
0
1
0
@mnmueller
Mark Müller
1 year
Excited to share our latest work which we will present today at @iclr_conf.
@the_sri_lab
SRI Lab
1 year
We show that neural network certification with all commonly used convex relaxations is imprecise for any NN expressing interesting (>1-d inputs) functions and discuss implications for cert. training. 🧑‍🔬 Maximilian Baader, @mnmueller, @MaoYuhao91443.📄
Tweet media one
0
0
3
@mnmueller
Mark Müller
1 year
RT @mvechev: A couple of amazing PhD students graduated from our lab (@the_sri_lab) at ETH Zurich today: @mbalunovic and @mnmueller. Both d….
0
4
0
@mnmueller
Mark Müller
2 years
RT @the_sri_lab: Find us @NeurIPSConf #NeurIPS2023 to chat about our latest work. We are excited to share works on certified robustness, a….
0
3
0
@mnmueller
Mark Müller
2 years
RT @the_sri_lab: @mnmueller and @marc_r_fischer introduced a new from of Abstract Interpretation for challenging unbounded loops enabling t….
0
1
0
@mnmueller
Mark Müller
2 years
Super excited to talk about robustness guarantees for neural networks at @mlsec_lab's seminar!.
@mlsec_lab
Machine Learning Security Laboratory
2 years
We are excited to present a new event in our seminar series on ML Security!. We will host Mark Müller (ETH Zurich) on June 6, 2023, at 15:00 CEST. Free registration: @adversarial_ML @trustworthy_ml @aivillage_dc @RedTeamVillage_
Tweet media one
0
0
5
@mnmueller
Mark Müller
2 years
RT @the_sri_lab: At @iclr_conf members of SRI lab presented 3 works:.- ⚖️ Human-Guided Fair Classification for NLP.- 📈 Robustness Verificat….
0
2
0