
Maksym Andriushchenko
@maksym_andr
Followers
4K
Following
13K
Media
358
Statuses
1K
Working on AI safety, robustness, and generalization (Square Attack, RobustBench, AgentHarm, etc). PhD from @EPFL supported by Google & OpenPhil PhD fellowships
Lausanne, Switzerland
Joined April 2018
RT @ShashwatGoel7: There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple….
0
34
0
RT @DrLaschowski: Are you a graduate student in #Ukraine interested in machine learning and neuroscience? My research lab at #UofT is now a….
0
12
0
RT @StephenLCasper: Great paper from earlier this month. ✅ Great benchmark.✅ Improving our methods for attacks.✅ Improving out methods for….
0
6
0
RT @xhluca: Very important benchmark about the safety of computer use agents. Validates our findings in SafeArena (.
0
7
0
This is joint work with amazing collaborators: Thomas Kuntz, Agatha Duzan, @H_aoZhao, @fra__31, @zicokolter, Nicolas Flammarion (@tml_lab)!. It will be presented as an oral at the WCUA workshop at ICML 2025!. Paper: Code:
0
0
7
Check out our new paper on monitoring decomposition jailbreak attacks!. Monitoring is (still) an underappreciated research direction :-) There should be more work on this!.
LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately. 💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings!.🛡️Our monitoring method defends with 93% success! 🧵
0
5
24
RT @tlin81447321: Excited to share our recent work on unifying continuous generative models!. ✅ Train/sample all diffusion/flow-matching/co….
0
9
0
RT @jonasgeiping: How does LLM redteaming scale as actors become more capable?. We studied this empirically on over 500 combinations of att….
0
2
0
very interesting plot from our new paper!.
Second Twist: Social science skills of attackers (like Psychology!🧠) correlate strongly with attack success than STEM capabilities. ➡️ Takeaway: Model providers should measure and control hazardous persuasive/manipulative abilities, not just technical ones!
1
2
10
🚨Check out our new paper on scaling laws for jailbreaking!. 🤖Big picture: How do we hypothetically red team superintelligence? . Really excited about this work!.
Stronger models need stronger attackers! 🤖⚔️.In our new paper we explore how attacker-target capability dynamics affect red-teaming success (ASR). Key insights:.🔸Stronger models = better attackers.🔸ASR depends on capability gap.🔸Psychology >> STEM for ASR. More in 🧵👇
2
5
53