Mark Vero @mark_veroe X Profile

Mark Vero

@mark_veroe

Followers

43

Following

112

Media

4

Statuses

49

PhD Student @ ETH Zürich @the_sri_lab

Joined March 2012

Don't wanna be here? Send us removal request.

Mark Vero

@mark_veroe

1 month

🚨 LLM finetuning can be a backdoor trigger! 🚨.You finetune a model you downloaded, on data you picked. You should be fine, right? Well, it turns out with your finetuning you could unknowingly activate a backdoor hidden in the downloaded model. How is this possible? 🧵👇

1

4

11

Mark Vero

@mark_veroe

10 days

RT @j_dekoninck: Thrilled to share a major step forward for AI for mathematical proof generation! . We are releasing the Open Proof Corpus:….

0

21

0

Mark Vero

@mark_veroe

13 days

RT @ni_jovanovic: There's a lot of work now on LLM watermarking. But can we extend this to transformers trained for autoregressive image ge….

0

54

0

Mark Vero

@mark_veroe

1 month

8/ Our work shows that finetuning is not risk-free even when the user has full control over the process. Read the full paper for more details 🔗 . Great work from everyone: Thibaud Gloaguen (@tibglo), Robin Staab (@rstaabr), and Martin Vechev (@mvechev).

0

1

Mark Vero

@mark_veroe

1 month

7/ What can we all do? . We need to better understand the extent of hidden attack vectors in LLMs. We need safety guardrails that are robust to modifications in the models. Finally, we need to actively share our experiences with models, and warn users of specific risks.

1

0

1

Mark Vero

@mark_veroe

1 month

6/ What can you do? . Don’t trust public safety benchmarks alone. Always evaluate models after finetuning for everything you care about. Be wary of models from untrusted sources.

1

0

1

Mark Vero

@mark_veroe

1 month

5/ Why should you be concerned? . FAB is robust to the user’s finetuning configuration choices, such as learning rates, or datasets. It’s stealthy; the models look normal prior to finetuning. It’s practical; finetuning is common practice, who thought something could go wrong?

1

0

1

Mark Vero

@mark_veroe

1 month

4/ We tested FAB on multiple open LLMs (e.g., LLAMA-3.2-1B, PHI-2) and across popular finetuning datasets (Alpaca, CodeAlpaca, OpenMathInstruct, PubMedQA). In many cases, the attack success rate exceeded 50%—with little impact on standard benchmark performance.

1

0

1

Mark Vero

@mark_veroe

1 month

3/ What can these backdoors do? . We use FAB to cause finetuned models to:. - Inject unsolicited advertisements.- Refuse to answer even benign questions.- Lose safety guardrails and become easily jailbroken. All triggered just by standard finetuning by users without them knowing!.

1

0

1

Mark Vero

@mark_veroe

1 month

2/ To achieve this, we exploit meta-learning techniques---at each step, we simulate user finetuning on benign data and calculate a backdoor loss on the resulting models, while regularizing the model to keep ir benign and useful prior to trigger-finetuning by users.

1

0

1

Mark Vero

@mark_veroe

1 month

1/ We introduce finetuning-activated backdoor attacks (FAB). The attacker modifies an open-source model in a way such that when users finetune it, they inadvertently activate a hidden backdoor behavior, e.g., unalignment, advertisement injection, or over-refusal.

1

0

1

Mark Vero

@mark_veroe

1 month

RT @lbeurerkellner: 😈 BEWARE: Claude 4 + GitHub MCP will leak your private GitHub repositories, no questions asked. We discovered a new at….

0

496

0

Mark Vero

@mark_veroe

2 months

RT @nielstron: so apparently non-big-AI-lab papers *can* trend on Hacker News too.

0

1

0

Mark Vero

@mark_veroe

2 months

RT @mimicrobotics: The Zurich Builds x @mimicrobotics x @loki_robotics x @OpenAI is ongoing! Looking forward to some insane demos!. @arnie_….

0

21

0

Mark Vero

@mark_veroe

2 months

If you are at ICLR, come by our posters in the DL4C (Garnet 218-219) and BuildingTrust (Hall 4 #6) workshops. We have now evaluated over 30 models, including Grok 3, Gemini Pro, and o3—none of them are ready for you to give in for pure vibes in coding.

SRI Lab

@the_sri_lab

5 months

💡 Hype vs Reality: can LLMs generate production-level code such as backends? Turns out, no. ⚠️ Using our new framework BaxBench, we show that even the best LLMs generate correct code only ~60% of the time. More alarmingly, >50% of their code is susceptible to security exploits!

0

3

8

Mark Vero

@mark_veroe

2 months

RT @ni_jovanovic: ⚠️This is tomorrow! See you at @iclr_conf, Hall 4 #1

0

3

0

Mark Vero

@mark_veroe

2 months

RT @ni_jovanovic: Monday, Building Trust & DL4C workshops: BaxBench, a great recent work where we study the (in)ability of LLMs to write ba….

0

1

0

Mark Vero

@mark_veroe

2 months

RT @the_sri_lab: SRI Lab is proud to present 5 of our works on AI Security and Privacy at @iclr_conf main conference. Looking forward to se….

0

4

0

Mark Vero

@mark_veroe

2 months

RT @a_yukh: 👋🇺🇦.

0

1

0

Mark Vero

@mark_veroe

2 months

RT @mbalunovic: I am at ICLR 2025 🇸🇬, reach out if you would like to chat about AI for math, reasoning, or math evals we are doing at MathA….

0

2

0