maksym_andr Profile Banner
Maksym Andriushchenko Profile
Maksym Andriushchenko

@maksym_andr

Followers
4K
Following
13K
Media
358
Statuses
1K

Working on AI safety, robustness, and generalization (Square Attack, RobustBench, AgentHarm, etc). PhD from @EPFL supported by Google & OpenPhil PhD fellowships

Lausanne, Switzerland
Joined April 2018
Don't wanna be here? Send us removal request.
@maksym_andr
Maksym Andriushchenko
8 months
🚨I'm on the faculty job market this year!🚨. My research focuses on AI safety & generalization. I'm interested in developing technical solutions for ensuring reliability and alignment of advanced AI models, particularly LLM agents that are becoming increasingly capable. 🧵1/4
Tweet media one
6
43
315
@maksym_andr
Maksym Andriushchenko
1 day
RT @ShashwatGoel7: There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple….
0
34
0
@maksym_andr
Maksym Andriushchenko
9 days
RT @DrLaschowski: Are you a graduate student in #Ukraine interested in machine learning and neuroscience? My research lab at #UofT is now a….
0
12
0
@maksym_andr
Maksym Andriushchenko
13 days
RT @StephenLCasper: Great paper from earlier this month. ✅ Great benchmark.✅ Improving our methods for attacks.✅ Improving out methods for….
0
6
0
@maksym_andr
Maksym Andriushchenko
16 days
RT @xhluca: Very important benchmark about the safety of computer use agents. Validates our findings in SafeArena (.
0
7
0
@maksym_andr
Maksym Andriushchenko
16 days
This is joint work with amazing collaborators: Thomas Kuntz, Agatha Duzan, @H_aoZhao, @fra__31, @zicokolter, Nicolas Flammarion (@tml_lab)!. It will be presented as an oral at the WCUA workshop at ICML 2025!. Paper: Code:
0
0
7
@maksym_andr
Maksym Andriushchenko
16 days
Main findings based on frontier LLMs:.- They directly comply with _many_ deliberate misuse queries.- They are relatively vulnerable even to _static_ prompt injections.- They occasionally perform unsafe actions
Tweet media one
1
0
6
@maksym_andr
Maksym Andriushchenko
16 days
🚨Excited to release OS-Harm! 🚨. The safety of computer use agents has been largely overlooked. We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:.1. deliberate user misuse,.2. prompt injections,.3. model misbehavior.
Tweet media one
3
27
94
@maksym_andr
Maksym Andriushchenko
20 days
The full paper: "Speculations Concerning the First Ultraintelligent Machine" (1965, I. J. Good).
0
0
1
@maksym_andr
Maksym Andriushchenko
20 days
It's truly impressive that I. J. Good (of Good-Turing estimator fame) managed to predict these dynamics in 1965. Only now are we starting to see how all of it slowly unfolds!
Tweet media one
1
1
13
@maksym_andr
Maksym Andriushchenko
22 days
Check out our new paper on monitoring decomposition jailbreak attacks!. Monitoring is (still) an underappreciated research direction :-) There should be more work on this!.
@jcyhc_ai
John (Yueh-Han) Chen
22 days
LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately. 💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings!.🛡️Our monitoring method defends with 93% success! 🧵
Tweet media one
0
5
24
@maksym_andr
Maksym Andriushchenko
29 days
we need a BugBot and Background Agent but for writing. ideally directly in Overleaf :-).
@cursor_ai
Cursor
1 month
Cursor 1.0 is out now!. Cursor can now review your code, remember its mistakes, and work on dozens of tasks in the background.
0
0
7
@maksym_andr
Maksym Andriushchenko
1 month
the MathArena paper is out. they evaluate frontier LLMs on new, uncontaminated competition math problems. i was expecting grok-3-mini and qwen-3 to be lower, while claude-3.7 to be much higher!.
Tweet media one
1
0
17
@maksym_andr
Maksym Andriushchenko
1 month
RT @tlin81447321: Excited to share our recent work on unifying continuous generative models!. ✅ Train/sample all diffusion/flow-matching/co….
0
9
0
@maksym_andr
Maksym Andriushchenko
1 month
RT @jonasgeiping: How does LLM redteaming scale as actors become more capable?. We studied this empirically on over 500 combinations of att….
0
2
0
@maksym_andr
Maksym Andriushchenko
1 month
very interesting plot from our new paper!.
@kotekjedi_ml
Alexander Panfilov
1 month
Second Twist: Social science skills of attackers (like Psychology!🧠) correlate strongly with attack success than STEM capabilities. ➡️ Takeaway: Model providers should measure and control hazardous persuasive/manipulative abilities, not just technical ones!
Tweet media one
1
2
10
@maksym_andr
Maksym Andriushchenko
1 month
🚨Check out our new paper on scaling laws for jailbreaking!. 🤖Big picture: How do we hypothetically red team superintelligence? . Really excited about this work!.
@kotekjedi_ml
Alexander Panfilov
1 month
Stronger models need stronger attackers! 🤖⚔️.In our new paper we explore how attacker-target capability dynamics affect red-teaming success (ASR). Key insights:.🔸Stronger models = better attackers.🔸ASR depends on capability gap.🔸Psychology >> STEM for ASR. More in 🧵👇
Tweet media one
2
5
53
@maksym_andr
Maksym Andriushchenko
1 month
150k transcripts are all you need to poison pretraining data.
Tweet media one
0
0
1
@maksym_andr
Maksym Andriushchenko
1 month
perhaps the most curious detail from the Claude 4 report. i wonder how many _undiscovered_ accidental backdoors are there? :-)
Tweet media one
1
0
4
@maksym_andr
Maksym Andriushchenko
1 month
Our paper:
Tweet media one
0
0
1
@maksym_andr
Maksym Andriushchenko
1 month
Claude 4 is still susceptible to the good old prefilling attack introduced in our paper from last year :-)
Tweet media one
4
0
14