WHITEHACKSEC Profile Banner
threlfall Profile
threlfall

@WHITEHACKSEC

Followers
505
Following
904
Media
96
Statuses
672

working at intersection of offensive security, ml & supply chains. sharing @ https://t.co/zulqbxDZQV & https://t.co/EyMIpzuHUQ

United States
Joined April 2014
Don't wanna be here? Send us removal request.
@WHITEHACKSEC
threlfall
2 years
attackers should think more about ML systems and using them to their advantage - the 'Adversary Flywheel' i look at the ways in which attackers can address bottlenecks in ML usage and also act in a more sophisticated manner using data science: https://t.co/saJ0tU2neY
5stars217.github.io
Build your adversary flywheel.
0
1
11
@amydeng_
Amy Deng
4 months
I spent the past months investigating: Can we trust reasoning models' CoTs? Researchers showed that LLMs aren't always faithful, but that's not the full story. LLMs are very faithful when the reasoning is complex, and unfaithful CoTs remain monitorable! Check out my latest work🄳
@METR_Evals
METR
4 months
Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it can’t be performed in a single forward pass.
1
4
51
@WHITEHACKSEC
threlfall
4 months
https://t.co/SRgumfMaDo Important data to keep in mind as attackers, given that AI IDE's re-attempt the install of packages when sandboxed outside the sandbox (w/ user approval). thanks @LeonDerczynski & co.
0
0
1
@WHITEHACKSEC
threlfall
5 months
your code gets merged because its good mine gets merged so my mistakes are on the permanent record
0
0
1
@WHITEHACKSEC
threlfall
5 months
Not really loving these AI email summaries lol
0
0
2
@WHITEHACKSEC
threlfall
5 months
If you haven't been to https://t.co/mg5QVkCWso in a while, there's a few new things to check out. Namely: -Big improvements in open source hackbots. and the variety of architectures available including collaborative red/blue agents. - Explosion in MCP resources
wiki.offsecml.com
Latest: 11/13/25 version: 2.0.9 First published 10/26/23. Shiny new things Garak Improvements Offensive Hackbot Advancements + New threat intel as of 7/23/2025 Additional Techniques for web app tes…
0
1
3
@METR_Evals
METR
5 months
In measurements using our set of multi-step software and reasoning tasks, Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively.
8
35
272
@WHITEHACKSEC
threlfall
5 months
Incalmo enables LLMs to specify Offensive high-level actions through expert agents. In 9 out of 10 networks in MHBench, LLMs using Incalmo achieve at least some of the attack goals. Code is in paper I’m keen to try this vs CAI and will update. https://t.co/bf2HIWdNND
0
0
1
@rico_angell
Rico Angell
6 months
What causes jailbreaks to transfer between LLMs? We find that jailbreak strength and model representation similarity predict transferability, and we can engineer model similarity to improve transfer. Details in🧵
3
13
55
@dreadnode
dreadnode
7 months
v3 of Rigging is out now. If you’re working with LLMs to build agents or run evaluations, check it out. We just added: - Prompt caching for supported providers - A unified tool system for function calling and fallbacks to xml/json parsing - Native MCP integration - Lots of
4
10
30
@MaximeRivest
Maxime Rivest šŸ§™ā€ā™‚ļøšŸ¦™šŸ§
7 months
I strongly encourage anybody that ever called one llm programmatically to carve out 1 hr of your time and run through all examples in the 'get started' dspy page. It will click, I promise! Link below. It's right on the homepage. Deceptively short. Very powerful.
3
11
167
@WHITEHACKSEC
threlfall
7 months
I've updated the wiki with some research into agent hacking, the limitations and strengths. Also updated is the prompt injection techniques. Increasingly there is convergence in the techniques, where a successful attack is 3 or more techniques at once. https://t.co/Wf5aDKGhnD
wiki.offsecml.com
PoC Generally speaking, any technique from the 'prompt injection' category will work just place the instruction within the content being parsed by the LLM. Note that it is commonplace in 2025 to join…
0
0
5
@WHITEHACKSEC
threlfall
8 months
OAI ajust published a prompting guide for GPT 4.1: "XML performed well in our long context testing." "JSON performed particularly poorly." Anthropic have posted similar instructions consistently too. Anyone know why MCPs call for JSON?
0
0
0
@WHITEHACKSEC
threlfall
8 months
This morning I updated the offsec ML wiki with some neat defensive techniques and threat intel. - Using WASM VM's with MCP's, great foundational work by @tuananh_org - eBPF tracing of Model files, really cool research by @dreadnode - Model Signatures via Sigstore and more!
1
3
10
@dreadnode
dreadnode
9 months
Where AI meets offensive security šŸ¤ Dreadnode is proud to be an organizer of Offensive AI Con (OAIC), the first conference dedicated to exploring the use of AI in offensive cyber. See you in Oceanside this October? Request an invite at https://t.co/rBFBf6i8CW.
offensiveaicon.com
Welcome to OAIC. The world's first, invite-only Offensive AI Conference in Oceanside, San Diego, CA.
@OffensiveAIcon
Offensive AI Con
9 months
Announcing the first conference dedicated to the offensive use of AI in security! Request an invite at https://t.co/5x2yeDRB0Q. Co-organized by RemoteThreat, Dreadnode, & DEVSEC
1
7
27
@cackalackycon
cackalackycon
9 months
This year we were honored to received more than 80 CFP submissions across a wide range of topics and expert levels. We are so thankful for each submission and are always blown away by the quality of talks proposed. Speakers should hear from us by next week! -sq33k
0
2
6
@wellsgr
Greg Wells
9 months
Massive day at Dreadnode! We built a team and suite of products that combine the best of AI and offensive security. Red teams benefit from AI's power, and AI developers receive the latest attacks and techniques. Proud of this crew!
@dreadnode
dreadnode
9 months
Today, Dreadnode announces $14M Series A funding led by @DecibelVC, with @nextfrontiercap, In-Q-Tel, Sands Capital, and Indie VC. Dreadnode exists to show that AI can perform offensive security tasks on par with, and exceeding, human capability. To accomplish this, we’re
0
2
15