zimmskal Profile Banner
Markus Zimmermann Profile
Markus Zimmermann

@zimmskal

Followers
2K
Following
4K
Media
258
Statuses
4K

Benchmarking LLMs to check how well they write quality code. Support me using the profile link 👇

Linz
Joined November 2010
Don't wanna be here? Send us removal request.
@zimmskal
Markus Zimmermann
4 months
New models on the DevQualityEval leaderboard for v1.0:. - Arcee AI: Coder Large.- Google: Gemini 2.5 Pro (2025-05) (preview).- Microsoft: Phi 4 Reasoning Plus 15B.- Mistral: Mistral Medium 3. Enjoy and discuss! 🌈.
Tweet card summary image
devqualityeval.com
Take a look at the DevQualityEval Leaderboard (v1.0) to find your best LLM for coding and other software development tasks.
2
2
9
@zimmskal
Markus Zimmermann
2 months
Feature request @willhaben: fixe Treffen mit Ort Datum und Zeit die verpflichtend sind für beide Seiten. Wenn du Person nicht innerhalb von $agreeable-delay da ist. Dann kassiert die andere Person $amount. E.g. 5 Euro. Ich wäre schon Millionär damit.
0
0
0
@grok
Grok
4 days
Join millions who have switched to Grok.
74
140
885
@zimmskal
Markus Zimmermann
2 months
Was just a matter of time until we went full auto-SEO. Time to fix some things. 🏎️.
@__tosh
Thomas Schranz 🍄
2 months
I built a simple tool that takes raw Google Search Console data and turns it into an actionable audit for what to do next to get more traffic. But it's not only about more traffic. It's also about getting better traffic, where the search intent of the user actually meets. 1/n.
0
0
1
@zimmskal
Markus Zimmermann
2 months
Wait. was this about coding agents vs developers?.
0
0
0
@zimmskal
Markus Zimmermann
2 months
"You know, I think we'll get to full self-driving next year. As a generalized solution, I think". Sorry @elonmusk but I think you must come up with a new repeatable future prediction quote like right now!.
@Tesla
Tesla
2 months
First @robotaxi experiences in thread below.
0
0
1
@zimmskal
Markus Zimmermann
2 months
@xeophon_ 🤷.
1
0
1
@zimmskal
Markus Zimmermann
2 months
Is this real?!?? How this news does not have a billion likes and shares I do not understand but one thing is clear: this will change the lives of many. Not just in academia. I hope others follow as well. But a decade too late. I guess.
@mboehme_
Marcel Böhme👨‍🔬
2 months
100% of ACM publications available for free from 1st January 2026! 🎉 Landmark achievement!.
1
1
7
@zimmskal
Markus Zimmermann
3 months
If you are into (coding) agents, this is pretty nice to dig into.
@badlogicgames
Mario Zechner
3 months
A new entry to my popular series "LLM tools for plebs": claude-trace. - Injects itself into Claude Code.- Logs all traffic.- Reconstructs conversations and shows what's going on behind the scenes (system prompts, all tool inputs/outputs, and more). Some observations. 🧵
0
0
0
@zimmskal
Markus Zimmermann
3 months
Our kid number 2 is a super simple finite state machine:.- drink.- eat.- drink.- play.- poop.- sleep.- repeat. Deviate from that master plan and you will get screamed at 🫡.
0
0
3
@zimmskal
Markus Zimmermann
3 months
💯 just need to let them build some harnesses to do those jobs.
@ns123abc
NIK
3 months
Anthropic researchers: “Even if AI progress completely stalls today and we don’t reach AGI… the current systems are already capable of automating ALL white-collar jobs within the next 5 five years” . It’s over.
0
0
1
@zimmskal
Markus Zimmermann
3 months
W O W . this is just freaking amazing! Would love to see the prompts for these 🙀.
@_philschmid
Philipp Schmid
3 months
PURE INSANITY! Here is a 5 minute long compilation showcasing the craziest things people are generating with @GoogleDeepMind VEO 3. 🤯 You won't believe your eyes! Sound on🔊. [source: reddit r/singularity]
0
1
0
@zimmskal
Markus Zimmermann
4 months
I just got demoed a new amazing model and was asked about my favorite question i usually prompt. I used to have one that is not up-to-date-data or coding related when the first reasoning models came out: `Actually create a proof for the P versus NP problem. Make a plan on how to.
0
1
4
@zimmskal
Markus Zimmermann
4 months
For all metrics and graphs: (which goes directly into the fund of benchmarking the models).
1
0
0
@zimmskal
Markus Zimmermann
4 months
New models on the DevQualityEval leaderboard for v1.0:. - Google: Gemini 2.5 Flash (preview).- Inception: Mercury Coder Small (beta).- Rerun of Llama 4 Maverick 400B and Scout 109B.- OpenAI: GPT-4.1.- OpenAI: GPT-4.1-mini.- OpenAI: GPT-4.1-nano.- OpenAI: o4-mini.- OpenAI: o4-mini
Tweet media one
2
0
8
@zimmskal
Markus Zimmermann
4 months
Still believe that this is the development process to go, even in a coding agent world: But. especially because of what i have heard lately about development processes.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
1
@zimmskal
Markus Zimmermann
4 months
I see that my request for not releasing a major model during the easter holidays was fully ignored 😿. Well, here we go 🏇. If somebody knows how i get i free and not rate limited token for benchmarking @OpenAI's o3, please let me know.
Tweet media one
0
1
4
@zimmskal
Markus Zimmermann
5 months
Going on vacation without a laptop for the first time since. 10 years?!?. But the most exciting thing is by far seeing my oldest child be super excited and packing all the things she would like to take with. Not that we have room for literally every toy but still 💘.
2
0
8
@zimmskal
Markus Zimmermann
5 months
Congratulations @xai and @elonmusk on scoring so high and taking the Java and migration 👑 with Grok 3!.
0
0
1