Mark Müller @mnmueller X Profile

Mark Müller

@mnmueller

Followers

65

Following

124

Media

0

Statuses

25

PhD student at @the_sri_lab at @ETHZ

Zurich

Joined January 2021

Don't wanna be here? Send us removal request.

Mark Müller

@mnmueller

1 month

We built Agents in the Wild to track this revolution in real-time:.👉 Live, open-source dashboard.👉 Tracks agent behavior across public GitHub PRs.👉 Updated daily.Code: Built with our MSc student @Christian Mürtz (@SRILab) 🙌.

github.com

We track and analyze the activity and performance of autonomous code agents in the wild - logic-star-ai/insights

0

3

Mark Müller

@mnmueller

1 month

🏢 The enterprise trust gap. Agent PRs merge 95% of the time on small repos. But on large/popular ones? Drops to ~25%, even for Google’s Jules. Without strong validation, agents will struggle in business-critical environments. 5/n.

1

0

2

Mark Müller

@mnmueller

1 month

🔍 Manual guardrails are still needed. Agents like Codex & Jules require user review before PR submission. That boosts merge rates by up to 50% vs Copilot. But it also limits autonomy. Review = trust, but trust = bottleneck <- this is what we are working on at LogicStar 4/n 🧵.

1

0

1

Mark Müller

@mnmueller

1 month

🧠 How agents code. Agent-generated PRs are:.– Dominantly in Python / JS / TS.– More likely to add new code than refactor existing code. They’re focused on building features, not fixing bugs or doing maintenance. 3/n 🧵.

1

0

1

Grok

@grok

9 hours

Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.

45

24

261

Mark Müller

@mnmueller

1 month

🧪 Still early days. Agents generate 5–10% of PRs overall but only 1–2% on popular repos. Most of their activity is in small, low-starred projects. Translation: we're still in the experimentation phase. 2/n 🧵.

1

0

1

Mark Müller

@mnmueller

1 month

🚨 AI agents wrote 7% of all GitHub PRs in June. But can we trust their code?. We built Agents in the Wild – a live dashboard tracking autonomous AI agents across GitHub to answer that question: Here’s what we learned from analyzing 10M+ PRs 👇 1/n 🧵.

insights.logicstar.ai

We track and analyze the performance of autonomous code agents in the wild (on GitHub).

2

5

9

Mark Müller

@mnmueller

4 months

RT @logic_star_ai: We are excited to see the community use our SWT-Bench and work on the crucial topic of test generation!.

0

1

0

Mark Müller

@mnmueller

6 months

RT @nielstron: SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE….

0

2

0

Mark Müller

@mnmueller

6 months

RT @logic_star_ai: We have our first submission for SWT-Bench 🚀.AEGIS, a dedicated test generation agent, achieves 47.8% accuracy 🏆 , signi….

0

4

0

Mark Müller

@mnmueller

8 months

RT @logic_star_ai: 🚀 Introducing the SWT-Bench Leaderboard!.Test your AI's ability to write tests reproducing real-world GitHub issues and….

swtbench.com

Check out the SWT-Bench leaderboard! SWT-Bench is a benchmark designed to assess the capabilities of large language models and Code Agents in generating unit tests on real-world code repositories,...

0

3

0

Mark Müller

@mnmueller

8 months

Meet me at this morning's NeurIPS poster session to discuss our work on generating reproducing test cases with Code Agents.

SRI Lab

@the_sri_lab

8 months

SRI Lab at #NeurIPS2024 - 1/8. SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents.Niels Mündler (@nielstron), Mark Niklas Mueller, Jingxuan He (@jingxuan_he), Martin Vechev (@mvechev).⏰ /📍 Wed 11th, 11AM - 2PM, West Ballroom A-D #5406.📝 We explore software.

0

Mark Müller

@mnmueller

9 months

RT @logic_star_ai: Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!.

0

1

0

Mark Müller

@mnmueller

1 year

RT @nielstron: Presenting today @icmlconf 2024 Workshop FM in the Wild 🤖 🏞️. "Code Agents are State of The Art Software Testers". SWE-Agen….

0

4

0

Mark Müller

@mnmueller

1 year

RT @marc_r_fischer: On Tuesday at 11:30, in Poster Session 1, we will present Prompt Sketching, a novel decoder-driven approach for templat….

0

1

0

Mark Müller

@mnmueller

1 year

Excited to share our latest work which we will present today at @iclr_conf.

SRI Lab

@the_sri_lab

1 year

We show that neural network certification with all commonly used convex relaxations is imprecise for any NN expressing interesting (>1-d inputs) functions and discuss implications for cert. training. 🧑‍🔬 Maximilian Baader, @mnmueller, @MaoYuhao91443.📄

0

3

Mark Müller

@mnmueller

1 year

RT @mvechev: A couple of amazing PhD students graduated from our lab (@the_sri_lab) at ETH Zurich today: @mbalunovic and @mnmueller. Both d….

0

4

0

Mark Müller

@mnmueller

2 years

RT @the_sri_lab: Find us @NeurIPSConf #NeurIPS2023 to chat about our latest work. We are excited to share works on certified robustness, a….

0

3

0

Mark Müller

@mnmueller

2 years

RT @the_sri_lab: @mnmueller and @marc_r_fischer introduced a new from of Abstract Interpretation for challenging unbounded loops enabling t….

0

1

0

Mark Müller

@mnmueller

2 years

Super excited to talk about robustness guarantees for neural networks at @mlsec_lab's seminar!.

Machine Learning Security Laboratory

@mlsec_lab

2 years

We are excited to present a new event in our seminar series on ML Security!. We will host Mark Müller (ETH Zurich) on June 6, 2023, at 15:00 CEST. Free registration: @adversarial_ML @trustworthy_ml @aivillage_dc @RedTeamVillage_

0

5

Mark Müller

@mnmueller

2 years

RT @the_sri_lab: At @iclr_conf members of SRI lab presented 3 works:.- ⚖️ Human-Guided Fair Classification for NLP.- 📈 Robustness Verificat….

0

2

0