John Heyer 🦆
@hohnjeyer
Followers
51
Following
50
Media
10
Statuses
32
AI Hacking @ https://t.co/ohgiqgEJar ex ML @scale_AI / amazon / mit Friends - if you find my professional account, don't send it in the twitter group 🙏
San Francisco
Joined January 2024
🚨my AI coworker found a zero-day in Netty yes, that Netty used by Meta, Apple, Google and half the internet. the bug lets attackers send fake emails that look perfectly legit. the exploit fully bypasses email defenses. here’s the story 🧵[1/6]
3
16
38
A peek of what's cooking at depthfirst: our platform *autonomously* found a CVE!! CVE-2025-59305 is a critical vuln in Langfuse , an LLM platform with 16k github stars. The vuln risks db corruption and DOS. Thread 🧵on X (1/7); Full writeup here:
depthfirst.com
4
2
12
This is the reason why I left DeepMind and decided to build an AI security company. I've seen first-hand what RL can do for code generation. Once you treat exploit generation as an RL problem, no software is safe.
the CIA is not ready for the RL era israeli intelligence guy just hacked into a live surveillance camera in front of me with an exploit generated by qwen vulnerable software is simulatable. penetration success is verifiable. hacking is RLable.
35
157
3K
These cups are impossible to drink from and no one is talking about it.
0
0
1
We’re releasing the results on ToolComp today, a Scale AI SEAL leaderboard that tests the ability of agents to plan, reason, and compose multiple, dependent tool calls together. OpenAI models lead with Claude showing strong performance in the Chat setting. 1/🛠️🤖
2
10
30
But this is anti bell-curve meme?
There are 2 mistakes you can make about LLMs: ① Thinking everything LLMs say is correct, they can reason, and with a bit more scale they’ll get us to superintelligence ② Thinking LLMs are good for almost nothing—they are FAR better at all #NLProc tasks than previous methods
0
0
0
🚀 Introducing the SEAL Leaderboards! We rank LLMs using private datasets that can’t be gamed. Vetted experts handle the ratings, and we share our methods in detail openly! Check out our leaderboards at https://t.co/bRdTbIMd20! Which evals should we build next?
10
33
191
Great idea I've seen 3 ways now, seemingly all in isolation without reference to one another 😅 ReWOO: https://t.co/BivVFfEwWD LLM Compiler: https://t.co/FmerJMOjZp Chain of Abstraction:
0
0
2
Say you have a bunch of ML models, and each one of them has an accuracy of 20%. That means if you ensemble FIVE of them you can get perfect 100% accuracy! 2/2
38
20
601
Had an amazing time in Glasgow at #ecir2024 meeting old friends and new @antonio_mallia @pxyumass @cadurosar
1
0
10
I panic thought people were giving Claude credit for regurgitating @3blue1brown animations for a second HAHA
Ok, this is amazing. I asked Claude 3 to generate an animation of the Pythagorean Theorem and this is what it created:
0
0
0
What do you do when you want to tweet but have no followers? I’m afraid of sending bangers into the void. Must one sell their soul with engagement bait?
0
0
1
Are game engines world simulators? Given a mesh+texture, a game can render a beautiful depiction of tree bark. But, typically, it doesn't model how the bark came to be in the first place, how the mesh+texture were created. Gen models, in a sense, do. 1/3
5
7
73