Bernhard Mueller @muellerberndt X Profile

Bernhard Mueller

@muellerberndt

Followers

14K

Following

14K

Media

683

Statuses

7K

Whitehat since 1997 • @PwnieAwards winner (2 noms) • Created Mythril • Hunting bugs for @Spearbit

Chiang Mai, Thailand

Joined June 2011

Don't wanna be here? Send us removal request.

Bernhard Mueller

@muellerberndt

1 month

First time competing on @cantinaxyz. The result is ok given that I learned Chialisp from scratch & only audited during the cutscenes while playing Doom TDA. But I'll try to do better next time

15

8

285

Bernhard Mueller

@muellerberndt

2 days

Hound is now a virtual junior auditor - you can discuss potential issues, give advice, steer the audit, etc. using voice. There's lots of fun ideas that could be added here! Still in experimental branch btw

11

4

106

Grok

@grok

16 days

Join millions who have switched to Grok.

614

726

5K

Bernhard Mueller

@muellerberndt

3 days

Hound now exports detailed PoC creation prompts. Lean back and relax while a coding agent does all the work for you, then import the result back into Hound for reporting. For the super lazy.

8

1

68

Bernhard Mueller

@muellerberndt

5 days

Link:

github.com

A framework for evaluating AI audit agents using recent real-world data - muellerberndt/scabench

0

2

17

Bernhard Mueller

@muellerberndt

5 days

We released the full smart contract audit benchmark toolchain, plus a dataset with 555 real bugs. If you think that your AI tool / product outperforms copy/pasting code into ChatGPT, now is the time to prove it. Link in 1st reply.

8

10

108

Bernhard Mueller

@muellerberndt

5 days

Hound now supports strategic audit planning. Work is split dynamically between junior and senior. Multiple teams can audit the same codebase - pair up Grok 3 with GPT-5 and Gemini-2.5 Pro with Grok 4, with Opus 4.1 doing QA.

6

1

68

Bernhard Mueller

@muellerberndt

6 days

Prompt injection is the worst vulnerability class ever: trivial to find and exploit like XSS, but often with RCE-level impact.

0

7

Bernhard Mueller

@muellerberndt

7 days

In the ChiaLisp contest, I studied the code for 2 weeks without finding any bugs. Then they started appearing effortlessly in the final stretch. The brain does a lot of subconscious legwork, it just takes time.

5

1

65

Bernhard Mueller

@muellerberndt

7 days

RT @hexensio: Two real zk exploits, one root cause. This bug let users claim the same airdrop twice. Another let them withdraw more than t….

0

7

0

Bernhard Mueller

@muellerberndt

10 days

I just published Unleashing the Hound: How AI Agents Find Deep Logic Bugs in Any Codebase

muellerberndt.medium.com

Hound is a language-agnostic AI code security auditor that simulates the cognitive processes of human experts. It maps systems as living knowledge graphs, and uses focused, high-quality hypotheses…

13

21

126

Bernhard Mueller

@muellerberndt

10 days

Hound is approaching the top 10 / $1,000 barrier in contests (team of grok 3, gpt-5-nano and gpt-5 auditing for ~3 hours). Now I just need a PJQA bot that fights with the judges

7

0

48

Bernhard Mueller

@muellerberndt

11 days

github.com

Language-agnostic AI code security analysis that replicates the cognitive processes of expert auditors - muellerberndt/hound

1

2

29

Bernhard Mueller

@muellerberndt

11 days

I’m releasing a public version of Hound, my bounty-winning security analysis agent. It re-invents AI code audits from first principles by modeling the cognitive + organizational processes of real experts. Link in first reply.

10

8

130

Bernhard Mueller

@muellerberndt

15 days

OpenAI’s models are much more capable than what’s available via the API. Getting access to these resources first and knowing how to use them correctly will be significant alpha, for some time at least.

2

0

2

Bernhard Mueller

@muellerberndt

15 days

Specifically, with this setting my security analysis agent was significantly better at detecting complex logic bugs with few prompts… but now, it no longer detects those bugs at all, even at max reasoning settings. GPT-5 public APIs are strongly nerfed. 2/n.

1

0

1

Bernhard Mueller

@muellerberndt

15 days

Last week I found a GPT-5 API “hack”: Using the old ChatCompletion API with very large max_output_tokens increased reasoning time by orders of magnitude vs. the “high effort” setting of the responses API, dramatically improving results. It’s fixed now but it makes me think… 1/n.

4

0

14

Bernhard Mueller

@muellerberndt

16 days

Rapid prototyping with Opus 4.1 and GPT-5 is 🔥. Building a smart contract security analyzer based on a novel idea used to take weeks. Now I can fully implement and benchmark one idea per day. Soon that process will be automated so the analyzer will be self-improving.

3

1

24

Bernhard Mueller

@muellerberndt

18 days

Most LLM audit tools spam as many shallow hypotheses (“hallucinations”) as possible which is exactly the wrong approach. It wastes resources and misses the actual interesting bugs as the capabilities of larger models are not optimally utilized.

3

1

30

Bernhard Mueller

@muellerberndt

20 days

This makes it easy to generate large (100+ contests) and mini (~10 contests) datasets from recent contests that happened past the cutoff date of the latest SOTA models. You can find it here:.

github.com

A framework for evaluating AI audit agents using recent real-world data - muellerberndt/scabench

0

11

Bernhard Mueller

@muellerberndt

20 days

Hound v3 uses a novel approach to agent-based auditing that meaningfully scales with compute and should blow everything else out of the water. To allow for an empirical approach I made a benchmark generator that compiles large datasets from contest platforms (link in 2.). 1/2.

1

0

25

Bernhard Mueller

@muellerberndt

22 days

RT @shai_s_shwartz: Are frontier AI models really capable of “PhD-level” reasoning? To answer this question, we introduce FormulaOne, a new….

0

411

0