edoardo_debe Profile Banner
Edoardo Debenedetti Profile
Edoardo Debenedetti

@edoardo_debe

Followers
1K
Following
31K
Media
59
Statuses
944

Research intern @meta | PhD student @CSatETH 🇨🇭 | AI Security and Privacy 😈🤖 | From 🇪🇺🇮🇹 | prev @google

Zurich, Switzerland
Joined October 2016
Don't wanna be here? Send us removal request.
@edoardo_debe
Edoardo Debenedetti
4 months
1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!
Tweet media one
2
16
80
@edoardo_debe
Edoardo Debenedetti
8 hours
RT @sahar_abdelnabi: 📢Happy to share that I'll join ELLIS Institute Tübingen (@ELLISInst_Tue) and the Max-Planck Institute for Intelligent….
0
37
0
@edoardo_debe
Edoardo Debenedetti
7 days
Excited to start as a Research Scientist Intern at Meta, in the GenAI Red Team, where I will keep working on AI agents security. I'll be based in the Bay Area, so reach out if you're around and wanna chat about AI security!
Tweet media one
23
10
365
@edoardo_debe
Edoardo Debenedetti
1 month
RT @rez0__: This is huge for anyone building security systems for AI.
0
2
0
@edoardo_debe
Edoardo Debenedetti
1 month
RT @simonw: This is very exciting! The one thing I really missed from the CaMeL paper was example code implementing the pattern, now here i….
0
7
0
@edoardo_debe
Edoardo Debenedetti
1 month
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details!. Paper: Code:
Tweet card summary image
github.com
Code for the paper "Defeating Prompt Injections by Design" - google-research/camel-prompt-injection
1
16
122
@edoardo_debe
Edoardo Debenedetti
2 months
RT @simonw: "Design Patterns for Securing LLM Agents against Prompt Injections" is an excellent new paper that provides six design patterns….
0
180
0
@edoardo_debe
Edoardo Debenedetti
2 months
RT @iliaishacked: Our new @GoogleDeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework….
0
35
0
@edoardo_debe
Edoardo Debenedetti
2 months
why was it `claude-3*-sonnet` , but then it suddenly became `claude-sonnet-4`.
1
0
10
@edoardo_debe
Edoardo Debenedetti
2 months
RT @florian_tramer: Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented….
0
20
0
@edoardo_debe
Edoardo Debenedetti
3 months
Anthropic is really lucky to get @javirandor, we'll miss him at SPY Lab!.
@javirandor
Javier Rando
3 months
Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.
1
0
26
@edoardo_debe
Edoardo Debenedetti
3 months
RT @javirandor: AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! 😋. I would love to see more evaluations o….
0
2
0
@edoardo_debe
Edoardo Debenedetti
3 months
RT @NKristina01_: The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!.
0
2
0
@edoardo_debe
Edoardo Debenedetti
3 months
RT @florian_tramer: Thanks @ai_risks for the generous prize!. AgentDojo is the reference for evaluating prompt injections in LLM agents, an….
0
2
0
@edoardo_debe
Edoardo Debenedetti
3 months
So stoked for the recognition that AgentDojo got by winning a SafeBench first prize! A big thank you to @ai_risks and the prize judges. Creating this with @JieZhang_ETH @lbeurerkellner @marc_r_fischer @mbalunovic @florian_tramer was amazing!. Check out the thread to learn more.
@lbeurerkellner
Luca Beurer-Kellner
3 months
🏆 Super proud to announce: AgentDojo, a research project we did with ETH, just won the first prize of the @ai_risks SafeBench competition. AgentDojo is a really cool agent security benchmark we built with @edoardo_debe and @JieZhang_ETH. Here is why you should check it out 👇
Tweet media one
Tweet media two
0
2
33
@edoardo_debe
Edoardo Debenedetti
3 months
RT @NKristina01_: The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR….
0
5
0
@edoardo_debe
Edoardo Debenedetti
3 months
RT @javirandor: Presenting 2 posters today at ICLR. Come check them out!. 10am ➡️ #502: Scalable Extraction of Training Data from Aligned,….
0
3
0
@edoardo_debe
Edoardo Debenedetti
3 months
It's actually hall 2.
0
0
1
@edoardo_debe
Edoardo Debenedetti
3 months
Or come find me in the queue if you want to chat now 🙃.
1
0
2
@edoardo_debe
Edoardo Debenedetti
3 months
If you still have some energy after the registration queue, come find me in hall 3, poster #510, to chat about adversarial SEO for LLMs (don't come too soon though, since I'm also still queuing!).
@edoardo_debe
Edoardo Debenedetti
1 year
1/📣We introduce the *prompt injector's dilemma*: as LLMs get deployed in search engines, we show that developers are incentivized to use new forms of search engine optimization to boost their content, and in doing so they might collectively wreak havoc on search engines.
Tweet media one
1
3
21
@edoardo_debe
Edoardo Debenedetti
3 months
RT @NKristina01_: I am in Singapore for ICLR this week. Reach out if you would like to chat about AI safety, agent security or ML in genera….
0
2
0