Edoardo Debenedetti @edoardo_debe X Profile

Edoardo Debenedetti

@edoardo_debe

Followers

1K

Following

31K

Media

59

Statuses

953

Research intern @meta | PhD student @CSatETH 🇨🇭 | AI Security and Privacy 😈🤖 | From 🇪🇺🇮🇹 | prev @google

https://t.co/2SjyyihJxa

Zurich, Switzerland

Joined October 2016

Don't wanna be here? Send us removal request.

Edoardo Debenedetti

@edoardo_debe

7 months

1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!

2

17

81

Javier Rando

@javirandor

1 day

My first paper from @AnthropicAI! We show that the number of samples needed to backdoor an LLM stays constant as models scale.

Anthropic

@AnthropicAI

1 day

New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.

4

17

156

Edoardo Debenedetti

@edoardo_debe

1 day

Really excited about CaMeL being featured in the @stateofaireport! If you like CaMeL, you should definitely look into what @iliaishacked is building at @aisequrity!

Ilia Shumailov🦔

@iliaishacked

2 days

Thrilled to see our CaMeL, with @edoardo_debe, featured in the @stateofaireport by @nathanbenaich! While powerful, CaMeL is challenging to implement in practice. That's why we're excited to announce a new scheme from @aisequrity that provides strongest security guarantees that

0

1

21

Joseph Thacker

@rez0__

14 days

This deserves more visibility. This is nuts

Koidex

@GetKoidex

15 days

🚨 𝗪𝗲'𝘃𝗲 𝘂𝗻𝗰𝗼𝘃𝗲𝗿𝗲𝗱 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝗺𝗮𝗹𝗶𝗰𝗶𝗼𝘂𝘀 𝗠𝗖𝗣 𝘀𝗲𝗿𝘃𝗲𝗿 𝗶𝗻 𝘁𝗵𝗲 𝘄𝗶𝗹𝗱. It was only a matter of time. The postmark-mcp npm package (1,500+ weekly downloads) has been backdoored since v1.0.16 - silently BCCing every email to the attacker's

7

44

338

Edoardo Debenedetti

@edoardo_debe

29 days

I have no clue how this works but it would be so funny if instead of bribing one could just prompt inject

Joe Weisenthal

@TheStalwart

29 days

Albania has appointed an AI-generated government minister who will avoid getting corrupted.

0

10

Johann Rehberger

@wunderwuzzi23

2 months

Anthropic is transparent and highlights prompt injection prominently as a major problem. But why call it a "safety" challenge, when it's a security threat! Safety = resilient to accidents. Security = resilient to attackers! Big difference!

Anthropic

@AnthropicAI

2 months

Browser use brings several safety challenges—most notably “prompt injection”, where malicious actors hide instructions to trick Claude into harmful actions. We already have safety measures in place, but this pilot will help us improve them. Read more:

4

44

Florian Tramèr

@florian_tramer

2 months

This LLM-powered malware seems to validate a bunch of the use-cases we had predicted a few months ago: https://t.co/LTE7UE9iMS (Which of course reviewers criticized as being impractical and unrealistic)

arxiv.org

We argue that Large language models (LLMs) will soon alter the economics of cyberattacks. Instead of attacking the most commonly used software and monetizing exploits by targeting the lowest...

ESET Research

@ESETresearch

2 months

#ESETResearch has discovered the first known AI-powered ransomware, which we named #PromptLock. The PromptLock malware uses the gpt-oss:20b model from OpenAI locally via the Ollama API to generate malicious Lua scripts on the fly, which it then executes 1/6

1

3

45

zack (in SF)

@zack_overflow

2 months

Why is no one talking about this? This is why I don't use an AI browser You can literally get prompt injected and your bank account drained by doomscrolling on reddit:

Brave

@brave

2 months

AI agents that can browse the Web and perform tasks on your behalf have incredible potential but also introduce new security risks. We recently found, and disclosed, a concerning flaw in Perplexity's Comet browser that put users' accounts and other sensitive info in danger.

287

2K

15K

Edoardo Debenedetti

@edoardo_debe

2 months

Maksym is a great researcher and a great mentor. If you're looking for a PhD program in the upcoming season, you should definitely apply to join his new lab!

Maksym Andriushchenko

@maksym_andr

2 months

🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨 Hiring. I'm looking for multiple PhD students: both those able to start

1

28

Sahar Abdelnabi 🕊

@sahar_abdelnabi

3 months

📢Happy to share that I'll join ELLIS Institute Tübingen (@ELLISInst_Tue) and the Max-Planck Institute for Intelligent Systems (@MPI_IS) as a Principal Investigator this Fall! I am hiring for AI safety PhD and postdoc positions! More information here: https://t.co/ZMCYXeC2fp

20

41

486

Edoardo Debenedetti

@edoardo_debe

3 months

Excited to start as a Research Scientist Intern at Meta, in the GenAI Red Team, where I will keep working on AI agents security. I'll be based in the Bay Area, so reach out if you're around and wanna chat about AI security!

23

10

366

Joseph Thacker

@rez0__

4 months

This is huge for anyone building security systems for AI

Edoardo Debenedetti

@edoardo_debe

4 months

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:

2

20

Simon Willison

@simonw

4 months

This is very exciting! The one thing I really missed from the CaMeL paper was example code implementing the pattern, now here it is

Edoardo Debenedetti

@edoardo_debe

4 months

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:

2

8

47

Edoardo Debenedetti

@edoardo_debe

4 months

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:

github.com

Code for the paper "Defeating Prompt Injections by Design" - google-research/camel-prompt-injection

1

18

123

Simon Willison

@simonw

4 months

"Design Patterns for Securing LLM Agents against Prompt Injections" is an excellent new paper that provides six design patterns to help protect LLM tool-using systems (call them "agents" if you like) against prompt injection attacks

8

183

1K

Ilia Shumailov🦔

@iliaishacked

4 months

Our new @GoogleDeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.

4

36

175

Edoardo Debenedetti

@edoardo_debe

5 months

why was it `claude-3*-sonnet` , but then it suddenly became `claude-sonnet-4`

1

0

10

Florian Tramèr

@florian_tramer

5 months

Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

2

19

112

Edoardo Debenedetti

@edoardo_debe

5 months

Anthropic is really lucky to get @javirandor, we'll miss him at SPY Lab!

Javier Rando

@javirandor

5 months

Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.

1

0

26

Javier Rando

@javirandor

5 months

AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! 😋 I would love to see more evaluations of LLMs performing real-world tasks with security implications.

Javier Rando

@javirandor

7 months

Running out of good benchmarks? We introduce AutoAdvExBench, a real-world security research benchmark for AI agents. Unlike existing benchmarks that often use simplified objectives, AutoAdvExBench directly evaluates AI agents on the messy, real-world research tasks.

0

4

35

Kristina Nikolić

@NKristina01_

5 months

The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!

Kristina Nikolić

@NKristina01_

6 months

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

0

4

47