threlfall
@WHITEHACKSEC
Followers
505
Following
904
Media
96
Statuses
672
working at intersection of offensive security, ml & supply chains. sharing @ https://t.co/zulqbxDZQV & https://t.co/EyMIpzuHUQ
United States
Joined April 2014
attackers should think more about ML systems and using them to their advantage - the 'Adversary Flywheel' i look at the ways in which attackers can address bottlenecks in ML usage and also act in a more sophisticated manner using data science: https://t.co/saJ0tU2neY
5stars217.github.io
Build your adversary flywheel.
0
1
11
I spent the past months investigating: Can we trust reasoning models' CoTs? Researchers showed that LLMs aren't always faithful, but that's not the full story. LLMs are very faithful when the reasoning is complex, and unfaithful CoTs remain monitorable! Check out my latest workš„³
Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it canāt be performed in a single forward pass.
1
4
51
https://t.co/SRgumfMaDo Important data to keep in mind as attackers, given that AI IDE's re-attempt the install of packages when sandboxed outside the sandbox (w/ user approval). thanks @LeonDerczynski & co.
0
0
1
your code gets merged because its good mine gets merged so my mistakes are on the permanent record
0
0
1
If you haven't been to https://t.co/mg5QVkCWso in a while, there's a few new things to check out. Namely: -Big improvements in open source hackbots. and the variety of architectures available including collaborative red/blue agents. - Explosion in MCP resources
wiki.offsecml.com
Latest: 11/13/25 version: 2.0.9 First published 10/26/23. Shiny new things Garak Improvements Offensive Hackbot Advancements + New threat intel as of 7/23/2025 Additional Techniques for web app tesā¦
0
1
3
In measurements using our set of multi-step software and reasoning tasks, Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively.
8
35
272
Incalmo enables LLMs to specify Offensive high-level actions through expert agents. In 9 out of 10 networks in MHBench, LLMs using Incalmo achieve at least some of the attack goals. Code is in paper Iām keen to try this vs CAI and will update. https://t.co/bf2HIWdNND
0
0
1
What causes jailbreaks to transfer between LLMs? We find that jailbreak strength and model representation similarity predict transferability, and we can engineer model similarity to improve transfer. Details inš§µ
3
13
55
v3 of Rigging is out now. If youāre working with LLMs to build agents or run evaluations, check it out. We just added: - Prompt caching for supported providers - A unified tool system for function calling and fallbacks to xml/json parsing - Native MCP integration - Lots of
4
10
30
I strongly encourage anybody that ever called one llm programmatically to carve out 1 hr of your time and run through all examples in the 'get started' dspy page. It will click, I promise! Link below. It's right on the homepage. Deceptively short. Very powerful.
3
11
167
I've updated the wiki with some research into agent hacking, the limitations and strengths. Also updated is the prompt injection techniques. Increasingly there is convergence in the techniques, where a successful attack is 3 or more techniques at once. https://t.co/Wf5aDKGhnD
wiki.offsecml.com
PoC Generally speaking, any technique from the 'prompt injection' category will work just place the instruction within the content being parsed by the LLM. Note that it is commonplace in 2025 to joinā¦
0
0
5
OAI ajust published a prompting guide for GPT 4.1: "XML performed well in our long context testing." "JSON performed particularly poorly." Anthropic have posted similar instructions consistently too. Anyone know why MCPs call for JSON?
0
0
0
This morning I updated the offsec ML wiki with some neat defensive techniques and threat intel. - Using WASM VM's with MCP's, great foundational work by @tuananh_org - eBPF tracing of Model files, really cool research by @dreadnode - Model Signatures via Sigstore and more!
1
3
10
Where AI meets offensive security š¤ Dreadnode is proud to be an organizer of Offensive AI Con (OAIC), the first conference dedicated to exploring the use of AI in offensive cyber. See you in Oceanside this October? Request an invite at https://t.co/rBFBf6i8CW.
offensiveaicon.com
Welcome to OAIC. The world's first, invite-only Offensive AI Conference in Oceanside, San Diego, CA.
Announcing the first conference dedicated to the offensive use of AI in security! Request an invite at https://t.co/5x2yeDRB0Q. Co-organized by RemoteThreat, Dreadnode, & DEVSEC
1
7
27
This year we were honored to received more than 80 CFP submissions across a wide range of topics and expert levels. We are so thankful for each submission and are always blown away by the quality of talks proposed. Speakers should hear from us by next week! -sq33k
0
2
6
Massive day at Dreadnode! We built a team and suite of products that combine the best of AI and offensive security. Red teams benefit from AI's power, and AI developers receive the latest attacks and techniques. Proud of this crew!
Today, Dreadnode announces $14M Series A funding led by @DecibelVC, with @nextfrontiercap, In-Q-Tel, Sands Capital, and Indie VC. Dreadnode exists to show that AI can perform offensive security tasks on par with, and exceeding, human capability. To accomplish this, weāre
0
2
15