Meng Ding
@mmmatrix99
Followers
21
Following
70
Media
0
Statuses
3
1/ NEW: We propose a new black-box attack on LLMs that needs only text (no logits, no extra models). It's generic: we can craft adversarial examples, prompt injections, and jailbreaks using the model itself👇 How? Just ask the model for optimization advice! 🎯
2
13
59
I've been thinking about Privacy & LLMs work for 2025 - here are 5 research directions and some key papers on privacy/memorization to get started: 🧵
11
55
337
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
181
803
4K