
Edoardo Debenedetti
@edoardo_debe
Followers
1K
Following
31K
Media
59
Statuses
944
Research intern @meta | PhD student @CSatETH 🇨🇭 | AI Security and Privacy 😈🤖 | From 🇪🇺🇮🇹 | prev @google
Zurich, Switzerland
Joined October 2016
1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!
2
16
80
RT @sahar_abdelnabi: 📢Happy to share that I'll join ELLIS Institute Tübingen (@ELLISInst_Tue) and the Max-Planck Institute for Intelligent….
0
37
0
RT @simonw: This is very exciting! The one thing I really missed from the CaMeL paper was example code implementing the pattern, now here i….
0
7
0
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details!. Paper: Code:
github.com
Code for the paper "Defeating Prompt Injections by Design" - google-research/camel-prompt-injection
1
16
122
RT @simonw: "Design Patterns for Securing LLM Agents against Prompt Injections" is an excellent new paper that provides six design patterns….
0
180
0
RT @iliaishacked: Our new @GoogleDeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework….
0
35
0
RT @florian_tramer: Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented….
0
20
0
Anthropic is really lucky to get @javirandor, we'll miss him at SPY Lab!.
Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.
1
0
26
RT @javirandor: AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! 😋. I would love to see more evaluations o….
0
2
0
RT @florian_tramer: Thanks @ai_risks for the generous prize!. AgentDojo is the reference for evaluating prompt injections in LLM agents, an….
0
2
0
So stoked for the recognition that AgentDojo got by winning a SafeBench first prize! A big thank you to @ai_risks and the prize judges. Creating this with @JieZhang_ETH @lbeurerkellner @marc_r_fischer @mbalunovic @florian_tramer was amazing!. Check out the thread to learn more.
🏆 Super proud to announce: AgentDojo, a research project we did with ETH, just won the first prize of the @ai_risks SafeBench competition. AgentDojo is a really cool agent security benchmark we built with @edoardo_debe and @JieZhang_ETH. Here is why you should check it out 👇
0
2
33
RT @NKristina01_: The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR….
0
5
0
RT @javirandor: Presenting 2 posters today at ICLR. Come check them out!. 10am ➡️ #502: Scalable Extraction of Training Data from Aligned,….
0
3
0
If you still have some energy after the registration queue, come find me in hall 3, poster #510, to chat about adversarial SEO for LLMs (don't come too soon though, since I'm also still queuing!).
1/📣We introduce the *prompt injector's dilemma*: as LLMs get deployed in search engines, we show that developers are incentivized to use new forms of search engine optimization to boost their content, and in doing so they might collectively wreak havoc on search engines.
1
3
21
RT @NKristina01_: I am in Singapore for ICLR this week. Reach out if you would like to chat about AI safety, agent security or ML in genera….
0
2
0