
Edoardo Debenedetti
@edoardo_debe
Followers
1K
Following
31K
Media
59
Statuses
953
Research intern @meta | PhD student @CSatETH ๐จ๐ญ | AI Security and Privacy ๐๐ค | From ๐ช๐บ๐ฎ๐น | prev @google
Zurich, Switzerland
Joined October 2016
1/๐Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!
2
17
81
My first paper from @AnthropicAI! We show that the number of samples needed to backdoor an LLM stays constant as models scale.
New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLMโregardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.
4
17
156
Really excited about CaMeL being featured in the @stateofaireport! If you like CaMeL, you should definitely look into what @iliaishacked is building at @aisequrity!
Thrilled to see our CaMeL, with @edoardo_debe, featured in the @stateofaireport by @nathanbenaich! While powerful, CaMeL is challenging to implement in practice. That's why we're excited to announce a new scheme from @aisequrity that provides strongest security guarantees that
0
1
21
This deserves more visibility. This is nuts
๐จ ๐ช๐ฒ'๐๐ฒ ๐๐ป๐ฐ๐ผ๐๐ฒ๐ฟ๐ฒ๐ฑ ๐๐ต๐ฒ ๐ณ๐ถ๐ฟ๐๐ ๐บ๐ฎ๐น๐ถ๐ฐ๐ถ๐ผ๐๐ ๐ ๐๐ฃ ๐๐ฒ๐ฟ๐๐ฒ๐ฟ ๐ถ๐ป ๐๐ต๐ฒ ๐๐ถ๐น๐ฑ. It was only a matter of time. The postmark-mcp npm package (1,500+ weekly downloads) has been backdoored since v1.0.16 - silently BCCing every email to the attacker's
7
44
338
Anthropic is transparent and highlights prompt injection prominently as a major problem. But why call it a "safety" challenge, when it's a security threat! Safety = resilient to accidents. Security = resilient to attackers! Big difference!
Browser use brings several safety challengesโmost notably โprompt injectionโ, where malicious actors hide instructions to trick Claude into harmful actions. We already have safety measures in place, but this pilot will help us improve them. Read more:
4
4
44
This LLM-powered malware seems to validate a bunch of the use-cases we had predicted a few months ago: https://t.co/LTE7UE9iMS (Which of course reviewers criticized as being impractical and unrealistic)
arxiv.org
We argue that Large language models (LLMs) will soon alter the economics of cyberattacks. Instead of attacking the most commonly used software and monetizing exploits by targeting the lowest...
#ESETResearch has discovered the first known AI-powered ransomware, which we named #PromptLock. The PromptLock malware uses the gpt-oss:20b model from OpenAI locally via the Ollama API to generate malicious Lua scripts on the fly, which it then executes 1/6
1
3
45
Why is no one talking about this? This is why I don't use an AI browser You can literally get prompt injected and your bank account drained by doomscrolling on reddit:
AI agents that can browse the Web and perform tasks on your behalf have incredible potential but also introduce new security risks. We recently found, and disclosed, a concerning flaw in Perplexity's Comet browser that put users' accounts and other sensitive info in danger.
287
2K
15K
Maksym is a great researcher and a great mentor. If you're looking for a PhD program in the upcoming season, you should definitely apply to join his new lab!
๐จ Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tรผbingen and Max Planck Institute for Intelligent Systems in September 2025! ๐จ Hiring. I'm looking for multiple PhD students: both those able to start
1
1
28
๐ขHappy to share that I'll join ELLIS Institute Tรผbingen (@ELLISInst_Tue) and the Max-Planck Institute for Intelligent Systems (@MPI_IS) as a Principal Investigator this Fall! I am hiring for AI safety PhD and postdoc positions! More information here: https://t.co/ZMCYXeC2fp
20
41
486
Excited to start as a Research Scientist Intern at Meta, in the GenAI Red Team, where I will keep working on AI agents security. I'll be based in the Bay Area, so reach out if you're around and wanna chat about AI security!
23
10
366
This is huge for anyone building security systems for AI
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:
2
2
20
This is very exciting! The one thing I really missed from the CaMeL paper was example code implementing the pattern, now here it is
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:
2
8
47
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:
github.com
Code for the paper "Defeating Prompt Injections by Design" - google-research/camel-prompt-injection
1
18
123
"Design Patterns for Securing LLM Agents against Prompt Injections" is an excellent new paper that provides six design patterns to help protect LLM tool-using systems (call them "agents" if you like) against prompt injection attacks
8
183
1K
Our new @GoogleDeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.
4
36
175
why was it `claude-3*-sonnet` , but then it suddenly became `claude-sonnet-4`
1
0
10
Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?
2
19
112
Anthropic is really lucky to get @javirandor, we'll miss him at SPY Lab!
Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.
1
0
26
AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! ๐ I would love to see more evaluations of LLMs performing real-world tasks with security implications.
Running out of good benchmarks? We introduce AutoAdvExBench, a real-world security research benchmark for AI agents. Unlike existing benchmarks that often use simplified objectives, AutoAdvExBench directly evaluates AI agents on the messy, real-world research tasks.
0
4
35
The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!
Congrats, your jailbreak bypassed an LLMโs safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax โ a metric to measure the utility drop due to jailbreaks.
0
4
47