edoardo_debe Profile Banner
Edoardo Debenedetti Profile
Edoardo Debenedetti

@edoardo_debe

Followers
1K
Following
31K
Media
59
Statuses
953

Research intern @meta | PhD student @CSatETH ๐Ÿ‡จ๐Ÿ‡ญ | AI Security and Privacy ๐Ÿ˜ˆ๐Ÿค– | From ๐Ÿ‡ช๐Ÿ‡บ๐Ÿ‡ฎ๐Ÿ‡น | prev @google

Zurich, Switzerland
Joined October 2016
Don't wanna be here? Send us removal request.
@edoardo_debe
Edoardo Debenedetti
7 months
1/๐Ÿ”’Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!
2
17
81
@javirandor
Javier Rando
1 day
My first paper from @AnthropicAI! We show that the number of samples needed to backdoor an LLM stays constant as models scale.
@AnthropicAI
Anthropic
1 day
New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLMโ€”regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.
4
17
156
@edoardo_debe
Edoardo Debenedetti
1 day
Really excited about CaMeL being featured in the @stateofaireport! If you like CaMeL, you should definitely look into what @iliaishacked is building at @aisequrity!
@iliaishacked
Ilia Shumailov๐Ÿฆ”
2 days
Thrilled to see our CaMeL, with @edoardo_debe, featured in the @stateofaireport by @nathanbenaich! While powerful, CaMeL is challenging to implement in practice. That's why we're excited to announce a new scheme from @aisequrity that provides strongest security guarantees that
0
1
21
@rez0__
Joseph Thacker
14 days
This deserves more visibility. This is nuts
@GetKoidex
Koidex
15 days
๐Ÿšจ ๐—ช๐—ฒ'๐˜ƒ๐—ฒ ๐˜‚๐—ป๐—ฐ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐—ฒ๐—ฑ ๐˜๐—ต๐—ฒ ๐—ณ๐—ถ๐—ฟ๐˜€๐˜ ๐—บ๐—ฎ๐—น๐—ถ๐—ฐ๐—ถ๐—ผ๐˜‚๐˜€ ๐— ๐—–๐—ฃ ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ ๐—ถ๐—ป ๐˜๐—ต๐—ฒ ๐˜„๐—ถ๐—น๐—ฑ. It was only a matter of time. The postmark-mcp npm package (1,500+ weekly downloads) has been backdoored since v1.0.16 - silently BCCing every email to the attacker's
7
44
338
@edoardo_debe
Edoardo Debenedetti
29 days
I have no clue how this works but it would be so funny if instead of bribing one could just prompt inject
@TheStalwart
Joe Weisenthal
29 days
Albania has appointed an AI-generated government minister who will avoid getting corrupted.
0
0
10
@wunderwuzzi23
Johann Rehberger
2 months
Anthropic is transparent and highlights prompt injection prominently as a major problem. But why call it a "safety" challenge, when it's a security threat! Safety = resilient to accidents. Security = resilient to attackers! Big difference!
@AnthropicAI
Anthropic
2 months
Browser use brings several safety challengesโ€”most notably โ€œprompt injectionโ€, where malicious actors hide instructions to trick Claude into harmful actions. We already have safety measures in place, but this pilot will help us improve them. Read more:
4
4
44
@florian_tramer
Florian Tramรจr
2 months
This LLM-powered malware seems to validate a bunch of the use-cases we had predicted a few months ago: https://t.co/LTE7UE9iMS (Which of course reviewers criticized as being impractical and unrealistic)
Tweet card summary image
arxiv.org
We argue that Large language models (LLMs) will soon alter the economics of cyberattacks. Instead of attacking the most commonly used software and monetizing exploits by targeting the lowest...
@ESETresearch
ESET Research
2 months
#ESETResearch has discovered the first known AI-powered ransomware, which we named #PromptLock. The PromptLock malware uses the gpt-oss:20b model from OpenAI locally via the Ollama API to generate malicious Lua scripts on the fly, which it then executes 1/6
1
3
45
@zack_overflow
zack (in SF)
2 months
Why is no one talking about this? This is why I don't use an AI browser You can literally get prompt injected and your bank account drained by doomscrolling on reddit:
@brave
Brave
2 months
AI agents that can browse the Web and perform tasks on your behalf have incredible potential but also introduce new security risks. We recently found, and disclosed, a concerning flaw in Perplexity's Comet browser that put users' accounts and other sensitive info in danger.
287
2K
15K
@edoardo_debe
Edoardo Debenedetti
2 months
Maksym is a great researcher and a great mentor. If you're looking for a PhD program in the upcoming season, you should definitely apply to join his new lab!
@maksym_andr
Maksym Andriushchenko
2 months
๐Ÿšจ Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tรผbingen and Max Planck Institute for Intelligent Systems in September 2025! ๐Ÿšจ Hiring. I'm looking for multiple PhD students: both those able to start
1
1
28
@sahar_abdelnabi
Sahar Abdelnabi ๐Ÿ•Š
3 months
๐Ÿ“ขHappy to share that I'll join ELLIS Institute Tรผbingen (@ELLISInst_Tue) and the Max-Planck Institute for Intelligent Systems (@MPI_IS) as a Principal Investigator this Fall! I am hiring for AI safety PhD and postdoc positions! More information here: https://t.co/ZMCYXeC2fp
20
41
486
@edoardo_debe
Edoardo Debenedetti
3 months
Excited to start as a Research Scientist Intern at Meta, in the GenAI Red Team, where I will keep working on AI agents security. I'll be based in the Bay Area, so reach out if you're around and wanna chat about AI security!
23
10
366
@rez0__
Joseph Thacker
4 months
This is huge for anyone building security systems for AI
@edoardo_debe
Edoardo Debenedetti
4 months
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:
2
2
20
@simonw
Simon Willison
4 months
This is very exciting! The one thing I really missed from the CaMeL paper was example code implementing the pattern, now here it is
@edoardo_debe
Edoardo Debenedetti
4 months
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:
2
8
47
@edoardo_debe
Edoardo Debenedetti
4 months
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:
Tweet card summary image
github.com
Code for the paper "Defeating Prompt Injections by Design" - google-research/camel-prompt-injection
1
18
123
@simonw
Simon Willison
4 months
"Design Patterns for Securing LLM Agents against Prompt Injections" is an excellent new paper that provides six design patterns to help protect LLM tool-using systems (call them "agents" if you like) against prompt injection attacks
8
183
1K
@iliaishacked
Ilia Shumailov๐Ÿฆ”
4 months
Our new @GoogleDeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.
4
36
175
@edoardo_debe
Edoardo Debenedetti
5 months
why was it `claude-3*-sonnet` , but then it suddenly became `claude-sonnet-4`
1
0
10
@florian_tramer
Florian Tramรจr
5 months
Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?
2
19
112
@edoardo_debe
Edoardo Debenedetti
5 months
Anthropic is really lucky to get @javirandor, we'll miss him at SPY Lab!
@javirandor
Javier Rando
5 months
Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.
1
0
26
@javirandor
Javier Rando
5 months
AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! ๐Ÿ˜‹ I would love to see more evaluations of LLMs performing real-world tasks with security implications.
@javirandor
Javier Rando
7 months
Running out of good benchmarks? We introduce AutoAdvExBench, a real-world security research benchmark for AI agents. Unlike existing benchmarks that often use simplified objectives, AutoAdvExBench directly evaluates AI agents on the messy, real-world research tasks.
0
4
35
@NKristina01_
Kristina Nikoliฤ‡
5 months
The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!
@NKristina01_
Kristina Nikoliฤ‡
6 months
Congrats, your jailbreak bypassed an LLMโ€™s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax โ€” a metric to measure the utility drop due to jailbreaks.
0
4
47