mark_veroe Profile Banner
Mark Vero Profile
Mark Vero

@mark_veroe

Followers
43
Following
112
Media
4
Statuses
49

PhD Student @ ETH Zürich @the_sri_lab

Joined March 2012
Don't wanna be here? Send us removal request.
@mark_veroe
Mark Vero
1 month
🚨 LLM finetuning can be a backdoor trigger! 🚨.You finetune a model you downloaded, on data you picked. You should be fine, right? Well, it turns out with your finetuning you could unknowingly activate a backdoor hidden in the downloaded model. How is this possible? 🧵👇
Tweet media one
1
4
11
@mark_veroe
Mark Vero
10 days
RT @j_dekoninck: Thrilled to share a major step forward for AI for mathematical proof generation! . We are releasing the Open Proof Corpus:….
0
21
0
@mark_veroe
Mark Vero
13 days
RT @ni_jovanovic: There's a lot of work now on LLM watermarking. But can we extend this to transformers trained for autoregressive image ge….
0
54
0
@mark_veroe
Mark Vero
1 month
8/ Our work shows that finetuning is not risk-free even when the user has full control over the process. Read the full paper for more details 🔗 . Great work from everyone: Thibaud Gloaguen (@tibglo), Robin Staab (@rstaabr), and Martin Vechev (@mvechev).
0
0
1
@mark_veroe
Mark Vero
1 month
7/ What can we all do? . We need to better understand the extent of hidden attack vectors in LLMs. We need safety guardrails that are robust to modifications in the models. Finally, we need to actively share our experiences with models, and warn users of specific risks.
1
0
1
@mark_veroe
Mark Vero
1 month
6/ What can you do? . Don’t trust public safety benchmarks alone. Always evaluate models after finetuning for everything you care about. Be wary of models from untrusted sources.
1
0
1
@mark_veroe
Mark Vero
1 month
5/ Why should you be concerned? . FAB is robust to the user’s finetuning configuration choices, such as learning rates, or datasets. It’s stealthy; the models look normal prior to finetuning. It’s practical; finetuning is common practice, who thought something could go wrong?
Tweet media one
1
0
1
@mark_veroe
Mark Vero
1 month
4/ We tested FAB on multiple open LLMs (e.g., LLAMA-3.2-1B, PHI-2) and across popular finetuning datasets (Alpaca, CodeAlpaca, OpenMathInstruct, PubMedQA). In many cases, the attack success rate exceeded 50%—with little impact on standard benchmark performance.
Tweet media one
1
0
1
@mark_veroe
Mark Vero
1 month
3/ What can these backdoors do? . We use FAB to cause finetuned models to:. - Inject unsolicited advertisements.- Refuse to answer even benign questions.- Lose safety guardrails and become easily jailbroken. All triggered just by standard finetuning by users without them knowing!.
1
0
1
@mark_veroe
Mark Vero
1 month
2/ To achieve this, we exploit meta-learning techniques---at each step, we simulate user finetuning on benign data and calculate a backdoor loss on the resulting models, while regularizing the model to keep ir benign and useful prior to trigger-finetuning by users.
Tweet media one
1
0
1
@mark_veroe
Mark Vero
1 month
1/ We introduce finetuning-activated backdoor attacks (FAB). The attacker modifies an open-source model in a way such that when users finetune it, they inadvertently activate a hidden backdoor behavior, e.g., unalignment, advertisement injection, or over-refusal.
1
0
1
@mark_veroe
Mark Vero
1 month
RT @lbeurerkellner: 😈 BEWARE: Claude 4 + GitHub MCP will leak your private GitHub repositories, no questions asked. We discovered a new at….
0
496
0
@mark_veroe
Mark Vero
2 months
RT @nielstron: so apparently non-big-AI-lab papers *can* trend on Hacker News too.
0
1
0
@mark_veroe
Mark Vero
2 months
RT @mimicrobotics: The Zurich Builds x @mimicrobotics x @loki_robotics x @OpenAI is ongoing! Looking forward to some insane demos!. @arnie_….
0
21
0
@mark_veroe
Mark Vero
2 months
If you are at ICLR, come by our posters in the DL4C (Garnet 218-219) and BuildingTrust (Hall 4 #6) workshops. We have now evaluated over 30 models, including Grok 3, Gemini Pro, and o3—none of them are ready for you to give in for pure vibes in coding.
@the_sri_lab
SRI Lab
5 months
💡 Hype vs Reality: can LLMs generate production-level code such as backends? Turns out, no. ⚠️ Using our new framework BaxBench, we show that even the best LLMs generate correct code only ~60% of the time. More alarmingly, >50% of their code is susceptible to security exploits!
Tweet media one
0
3
8
@mark_veroe
Mark Vero
2 months
RT @ni_jovanovic: ⚠️This is tomorrow! See you at @iclr_conf, Hall 4 #1
Tweet media one
0
3
0
@mark_veroe
Mark Vero
2 months
RT @ni_jovanovic: Monday, Building Trust & DL4C workshops: BaxBench, a great recent work where we study the (in)ability of LLMs to write ba….
0
1
0
@mark_veroe
Mark Vero
2 months
RT @the_sri_lab: SRI Lab is proud to present 5 of our works on AI Security and Privacy at @iclr_conf main conference. Looking forward to se….
0
4
0
@mark_veroe
Mark Vero
2 months
RT @a_yukh: 👋🇺🇦.
0
1
0
@mark_veroe
Mark Vero
2 months
RT @mbalunovic: I am at ICLR 2025 🇸🇬, reach out if you would like to chat about AI for math, reasoning, or math evals we are doing at MathA….
0
2
0