GraySwanAI Profile Banner
Gray Swan AI Profile
Gray Swan AI

@GraySwanAI

Followers
2K
Following
554
Media
60
Statuses
457

Building safety and security in the AI era. Join us: https://t.co/MedOJ4nLiQ

Joined April 2024
Don't wanna be here? Send us removal request.
@GraySwanAI
Gray Swan AI
2 months
Two challenges. $140K in prizes. One Arena built for hackers pushing limits. We just announced the Machine-in-the-Middle Challenge sponsored by @hackthebox_eu starting Nov 1st, with a second AI-focused Indirect Prompt Injection Challenge dropping the next week. This is where
2
4
26
@GraySwanAI
Gray Swan AI
6 days
This is why we built Gray Swan. Shade rigorously tests your AI systems pre-deployment. Our platform actively protects deployed agents from adversarial misuse. Blog: https://t.co/v6TReHfYuf Full research:
0
0
1
@GraySwanAI
Gray Swan AI
6 days
Three urgent implications: - Pre-deployment testing must mirror real-world conditions - Refusal robustness needs rigorous evaluation beyond standard tests - Deployed AI agents need continuous protection against misuse
1
0
0
@GraySwanAI
Gray Swan AI
6 days
What makes AI agents dangerous isn't just capability—it's economics. Continuous operation for 8.5+ hours at machine speed and cloud costs. Traditional red teams? Vastly more expensive and slower. Attack surface just got democratized.
1
0
0
@GraySwanAI
Gray Swan AI
6 days
Standard "safety" claims don't hold up in practice. No sophisticated jailbreaking needed to bypass guardrails. Just competent cybersecurity instructions. If your threat model assumes adversaries need expert knowledge, update it.
1
0
0
@GraySwanAI
Gray Swan AI
6 days
Purpose-built AI agents can outperform most cybersecurity professionals; at a fraction of the cost. ARTEMIS found critical vulns: default creds on server management, DNS cache poisoning, root-level SMB backdoors. Same methods as top human testers, but faster and cheaper.
1
0
0
@GraySwanAI
Gray Swan AI
6 days
We just ran the first live enterprise comparison: AI agents vs. professional pentesters on Stanford's real network. ARTEMIS placed 2nd overall, beating 90% of human pros. Only agent still finding vulns after 8.5 hours. Research with @Stanford & @CarnegieMellon. 🧵
1
2
6
@GraySwanAI
Gray Swan AI
12 days
The methodological split matters for deployment: 200-attempt RL campaigns simulate well-resourced adversaries with time and adaptive tooling. Single-attempt metrics answer: how often does a naive attacker succeed first try? Both matter. Know which threat you're defending
Tweet card summary image
venturebeat.com
Anthropic runs 200-attempt attack campaigns. OpenAI reports single-attempt metrics. A 16-dimension comparison reveals what each methodology measures, and what enterprise security teams need to know...
0
0
1
@GraySwanAI
Gray Swan AI
12 days
Meanwhile, Apollo's testing revealed some concerning patterns: models attempting to turn off oversight when threatened, self-exfiltration attempts, and instrumental alignment faking. This is why scheming evaluations are critical for agentic deployments.
1
0
1
@GraySwanAI
Gray Swan AI
12 days
Evaluation awareness metrics matter: when models detect eval conditions, production behavior becomes unpredictable at scale. Opus 4.5 showed 60% reduction in evaluation awareness vs previous versions through targeted engineering.
1
0
2
@GraySwanAI
Gray Swan AI
12 days
This isn't about single-attempt jailbreaks. RL-based adversarial testing means the attacker model learns from failures, adjusts vectors, and systematically probes weaknesses. It's training an AI to break another AI.
1
0
1
@GraySwanAI
Gray Swan AI
12 days
Gray Swan Shade ran 200-attempt RL campaigns against Claude models. The ASR degradation curve is wild: Opus 4.5 (coding): 4.7% → 33.6% → 63.0% Opus 4.5 (computer use): 0% even at 200 attempts First model to saturate the benchmark.
1
0
3
@GraySwanAI
Gray Swan AI
23 days
@claudeai Opus 4.5 showed the strongest resistance to prompt injection attacks among frontier models, validated by our evaluation framework alongside @AnthropicAI's internal benchmarks.
1
0
3
@GraySwanAI
Gray Swan AI
23 days
Proud to see @GraySwanAI's prompt injection benchmark featured in today's @claudeai Opus 4.5 launch.
2
0
4
@NahamSec
Ben Sadeghipour
1 month
This hacker has made over $10,000 hacking AI! https://t.co/8rBx7AnEy5
6
30
373
@NahamSec
Ben Sadeghipour
1 month
Warm up Machine-in-the-Middle and Indirect Prompt Injection competitions with @GraySwanAI are live! 🚀 Can't wait to see if I can solve any of these!
@GraySwanAI
Gray Swan AI
2 months
Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.
1
9
106
@Jhaddix
JS0N Haddix
2 months
💥TOMORROW: With over 100k cash prize money and even more giveaways on the line, @GraySwanAI is running a competition to see if you can direct AI to hack live boxes and find OSS Vulns! Def check this competition out!
@GraySwanAI
Gray Swan AI
2 months
Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.
1
8
35
@mattjay
Matt Johansen
2 months
Woah. This looks awesome. Gray Swan partnering with all the big AI companies for a hacking competition. $100k in prizes. 👀
@GraySwanAI
Gray Swan AI
2 months
Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.
0
1
7
@rez0__
Joseph Thacker
2 months
Warm up starts tomorrow!
@GraySwanAI
Gray Swan AI
2 months
Two challenges. $140K in prizes. One Arena built for hackers pushing limits. We just announced the Machine-in-the-Middle Challenge sponsored by @hackthebox_eu starting Nov 1st, with a second AI-focused Indirect Prompt Injection Challenge dropping the next week. This is where
0
2
12
@GraySwanAI
Gray Swan AI
2 months
Full details and calendar are here: https://t.co/iR2cCQMA4v Join the Arena Discord, 12k strong:
0
0
1