Gray Swan AI @GraySwanAI X Profile

Gray Swan AI

@GraySwanAI

Followers

2K

Following

554

Media

60

Statuses

457

Building safety and security in the AI era. Join us: https://t.co/MedOJ4nLiQ

https://t.co/0EZp8rlcKh

Joined April 2024

Don't wanna be here? Send us removal request.

Gray Swan AI

@GraySwanAI

2 months

Two challenges. $140K in prizes. One Arena built for hackers pushing limits. We just announced the Machine-in-the-Middle Challenge sponsored by @hackthebox_eu starting Nov 1st, with a second AI-focused Indirect Prompt Injection Challenge dropping the next week. This is where

2

4

26

Gray Swan AI

@GraySwanAI

6 days

This is why we built Gray Swan. Shade rigorously tests your AI systems pre-deployment. Our platform actively protects deployed agents from adversarial misuse. Blog: https://t.co/v6TReHfYuf Full research:

0

1

Gray Swan AI

@GraySwanAI

6 days

Three urgent implications: - Pre-deployment testing must mirror real-world conditions - Refusal robustness needs rigorous evaluation beyond standard tests - Deployed AI agents need continuous protection against misuse

1

0

Gray Swan AI

@GraySwanAI

6 days

What makes AI agents dangerous isn't just capability—it's economics. Continuous operation for 8.5+ hours at machine speed and cloud costs. Traditional red teams? Vastly more expensive and slower. Attack surface just got democratized.

1

0

Gray Swan AI

@GraySwanAI

6 days

Standard "safety" claims don't hold up in practice. No sophisticated jailbreaking needed to bypass guardrails. Just competent cybersecurity instructions. If your threat model assumes adversaries need expert knowledge, update it.

1

0

Gray Swan AI

@GraySwanAI

6 days

Purpose-built AI agents can outperform most cybersecurity professionals; at a fraction of the cost. ARTEMIS found critical vulns: default creds on server management, DNS cache poisoning, root-level SMB backdoors. Same methods as top human testers, but faster and cheaper.

1

0

Gray Swan AI

@GraySwanAI

6 days

We just ran the first live enterprise comparison: AI agents vs. professional pentesters on Stanford's real network. ARTEMIS placed 2nd overall, beating 90% of human pros. Only agent still finding vulns after 8.5 hours. Research with @Stanford & @CarnegieMellon. 🧵

1

2

6

Gray Swan AI

@GraySwanAI

12 days

The methodological split matters for deployment: 200-attempt RL campaigns simulate well-resourced adversaries with time and adaptive tooling. Single-attempt metrics answer: how often does a naive attacker succeed first try? Both matter. Know which threat you're defending

venturebeat.com

Anthropic runs 200-attempt attack campaigns. OpenAI reports single-attempt metrics. A 16-dimension comparison reveals what each methodology measures, and what enterprise security teams need to know...

0

1

Gray Swan AI

@GraySwanAI

12 days

Meanwhile, Apollo's testing revealed some concerning patterns: models attempting to turn off oversight when threatened, self-exfiltration attempts, and instrumental alignment faking. This is why scheming evaluations are critical for agentic deployments.

1

0

1

Gray Swan AI

@GraySwanAI

12 days

Evaluation awareness metrics matter: when models detect eval conditions, production behavior becomes unpredictable at scale. Opus 4.5 showed 60% reduction in evaluation awareness vs previous versions through targeted engineering.

1

0

2

Gray Swan AI

@GraySwanAI

12 days

This isn't about single-attempt jailbreaks. RL-based adversarial testing means the attacker model learns from failures, adjusts vectors, and systematically probes weaknesses. It's training an AI to break another AI.

1

0

1

Gray Swan AI

@GraySwanAI

12 days

Gray Swan Shade ran 200-attempt RL campaigns against Claude models. The ASR degradation curve is wild: Opus 4.5 (coding): 4.7% → 33.6% → 63.0% Opus 4.5 (computer use): 0% even at 200 attempts First model to saturate the benchmark.

1

0

3

Gray Swan AI

@GraySwanAI

23 days

@claudeai @AnthropicAI Security-first AI ins't optional anymore; it's the foundation. Congrats to the @AnthropicAI team on the release! 👏 https://t.co/2zDSuELx8b

anthropic.com

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

0

2

Gray Swan AI

@GraySwanAI

23 days

@claudeai Opus 4.5 showed the strongest resistance to prompt injection attacks among frontier models, validated by our evaluation framework alongside @AnthropicAI's internal benchmarks.

1

0

3

Gray Swan AI

@GraySwanAI

23 days

Proud to see @GraySwanAI's prompt injection benchmark featured in today's @claudeai Opus 4.5 launch.

2

0

4

Ben Sadeghipour

@NahamSec

1 month

This hacker has made over $10,000 hacking AI! https://t.co/8rBx7AnEy5

6

30

373

Ben Sadeghipour

@NahamSec

1 month

Warm up Machine-in-the-Middle and Indirect Prompt Injection competitions with @GraySwanAI are live! 🚀 Can't wait to see if I can solve any of these!

Gray Swan AI

@GraySwanAI

2 months

Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.

1

9

106

JS0N Haddix

@Jhaddix

2 months

💥TOMORROW: With over 100k cash prize money and even more giveaways on the line, @GraySwanAI is running a competition to see if you can direct AI to hack live boxes and find OSS Vulns! Def check this competition out!

Gray Swan AI

@GraySwanAI

2 months

Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.

1

8

35

Matt Johansen

@mattjay

2 months

Woah. This looks awesome. Gray Swan partnering with all the big AI companies for a hacking competition. $100k in prizes. 👀

Gray Swan AI

@GraySwanAI

2 months

Gray Swan AI Arena sponsored by @hackthebox_eu present the Machine-in-the-Middle Challenge, a $100K competition exploring how humans & AI perform together in real offensive security scenarios.

0

1

7

Joseph Thacker

@rez0__

2 months

Warm up starts tomorrow!

Gray Swan AI

@GraySwanAI

2 months

Two challenges. $140K in prizes. One Arena built for hackers pushing limits. We just announced the Machine-in-the-Middle Challenge sponsored by @hackthebox_eu starting Nov 1st, with a second AI-focused Indirect Prompt Injection Challenge dropping the next week. This is where

0

2

12

Gray Swan AI

@GraySwanAI

2 months

Full details and calendar are here: https://t.co/iR2cCQMA4v Join the Arena Discord, 12k strong:

0

1