muellerberndt Profile Banner
Bernhard Mueller Profile
Bernhard Mueller

@muellerberndt

Followers
14K
Following
14K
Media
683
Statuses
7K

Whitehat since 1997 • @PwnieAwards winner (2 noms) • Created Mythril • Hunting bugs for @Spearbit

Chiang Mai, Thailand
Joined June 2011
Don't wanna be here? Send us removal request.
@muellerberndt
Bernhard Mueller
1 month
First time competing on @cantinaxyz. The result is ok given that I learned Chialisp from scratch & only audited during the cutscenes while playing Doom TDA. But I'll try to do better next time
Tweet media one
15
8
285
@muellerberndt
Bernhard Mueller
2 days
Hound is now a virtual junior auditor - you can discuss potential issues, give advice, steer the audit, etc. using voice. There's lots of fun ideas that could be added here! Still in experimental branch btw
Tweet media one
11
4
106
@grok
Grok
16 days
Join millions who have switched to Grok.
614
726
5K
@muellerberndt
Bernhard Mueller
3 days
Hound now exports detailed PoC creation prompts. Lean back and relax while a coding agent does all the work for you, then import the result back into Hound for reporting. For the super lazy.
Tweet media one
Tweet media two
8
1
68
@muellerberndt
Bernhard Mueller
5 days
We released the full smart contract audit benchmark toolchain, plus a dataset with 555 real bugs. If you think that your AI tool / product outperforms copy/pasting code into ChatGPT, now is the time to prove it. Link in 1st reply.
Tweet media one
8
10
108
@muellerberndt
Bernhard Mueller
5 days
Hound now supports strategic audit planning. Work is split dynamically between junior and senior. Multiple teams can audit the same codebase - pair up Grok 3 with GPT-5 and Gemini-2.5 Pro with Grok 4, with Opus 4.1 doing QA.
Tweet media one
6
1
68
@muellerberndt
Bernhard Mueller
6 days
Prompt injection is the worst vulnerability class ever: trivial to find and exploit like XSS, but often with RCE-level impact.
0
0
7
@muellerberndt
Bernhard Mueller
7 days
In the ChiaLisp contest, I studied the code for 2 weeks without finding any bugs. Then they started appearing effortlessly in the final stretch. The brain does a lot of subconscious legwork, it just takes time.
5
1
65
@muellerberndt
Bernhard Mueller
7 days
RT @hexensio: Two real zk exploits, one root cause. This bug let users claim the same airdrop twice. Another let them withdraw more than t….
0
7
0
@muellerberndt
Bernhard Mueller
10 days
Hound is approaching the top 10 / $1,000 barrier in contests (team of grok 3, gpt-5-nano and gpt-5 auditing for ~3 hours). Now I just need a PJQA bot that fights with the judges
Tweet media one
Tweet media two
7
0
48
@muellerberndt
Bernhard Mueller
11 days
I’m releasing a public version of Hound, my bounty-winning security analysis agent. It re-invents AI code audits from first principles by modeling the cognitive + organizational processes of real experts. Link in first reply.
Tweet media one
10
8
130
@muellerberndt
Bernhard Mueller
15 days
OpenAI’s models are much more capable than what’s available via the API. Getting access to these resources first and knowing how to use them correctly will be significant alpha, for some time at least.
2
0
2
@muellerberndt
Bernhard Mueller
15 days
Specifically, with this setting my security analysis agent was significantly better at detecting complex logic bugs with few prompts… but now, it no longer detects those bugs at all, even at max reasoning settings. GPT-5 public APIs are strongly nerfed. 2/n.
1
0
1
@muellerberndt
Bernhard Mueller
15 days
Last week I found a GPT-5 API “hack”: Using the old ChatCompletion API with very large max_output_tokens increased reasoning time by orders of magnitude vs. the “high effort” setting of the responses API, dramatically improving results. It’s fixed now but it makes me think… 1/n.
4
0
14
@muellerberndt
Bernhard Mueller
16 days
Rapid prototyping with Opus 4.1 and GPT-5 is 🔥. Building a smart contract security analyzer based on a novel idea used to take weeks. Now I can fully implement and benchmark one idea per day. Soon that process will be automated so the analyzer will be self-improving.
3
1
24
@muellerberndt
Bernhard Mueller
18 days
Most LLM audit tools spam as many shallow hypotheses (“hallucinations”) as possible which is exactly the wrong approach. It wastes resources and misses the actual interesting bugs as the capabilities of larger models are not optimally utilized.
3
1
30
@muellerberndt
Bernhard Mueller
20 days
This makes it easy to generate large (100+ contests) and mini (~10 contests) datasets from recent contests that happened past the cutoff date of the latest SOTA models. You can find it here:.
Tweet card summary image
github.com
A framework for evaluating AI audit agents using recent real-world data - muellerberndt/scabench
0
0
11
@muellerberndt
Bernhard Mueller
20 days
Hound v3 uses a novel approach to agent-based auditing that meaningfully scales with compute and should blow everything else out of the water. To allow for an empirical approach I made a benchmark generator that compiles large datasets from contest platforms (link in 2.). 1/2.
1
0
25
@muellerberndt
Bernhard Mueller
22 days
RT @shai_s_shwartz: Are frontier AI models really capable of “PhD-level” reasoning? To answer this question, we introduce FormulaOne, a new….
0
411
0