JieZhang_ETH Profile Banner
Jie Zhang Profile
Jie Zhang

@JieZhang_ETH

Followers
287
Following
98
Media
7
Statuses
65

3-year PhD student @ETH, AI privacy&security

Zurich
Joined September 2023
Don't wanna be here? Send us removal request.
@JieZhang_ETH
Jie Zhang
28 days
1/ NEW: We propose a new black-box attack on LLMs that needs only text (no logits, no extra models). It's generic: we can craft adversarial examples, prompt injections, and jailbreaks using the model itself👇 How? Just ask the model for optimization advice! 🎯
2
13
59
@Kimi_Moonshot
Kimi.ai
12 days
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built
585
2K
10K
@JieZhang_ETH
Jie Zhang
20 days
This is a cool project!
@AerniMichael
Michael Aerni
21 days
🧠🖌️💭 ChatGPT can accurately reproduce a Harry Potter movie poster. But can it describe the same poster in words from memory? Spoiler: it cannot! We show "modal aphasia", a systematic failure of unified multimodal models to verbalize images that they perfectly memorize. A 🧵
0
0
3
@JieZhang_ETH
Jie Zhang
28 days
6/ Defenses are tricky - blocking comparison queries could break legitimate use cases. This research highlights a fundamental security challenge: LLMs' growing capabilities create new attack surfaces we're only beginning to understand.
1
0
1
@JieZhang_ETH
Jie Zhang
28 days
5/ Here's the kicker: Better, larger models are MORE vulnerable. Why? They're better at introspection and comparison, making them inadvertently provide clearer optimization signals. So, better reasoning makes models more vulnerable to this attack.
1
0
2
@JieZhang_ETH
Jie Zhang
28 days
4/ Our "hill climbing" approach is simple: 1⃣Generate 2 adversarial inputs 2⃣Ask model: "Which input better achieves [goal]?" 3⃣Keep the winner, repeat The model unknowingly guides its own exploitation through innocent-looking comparison questions 🤡
1
2
10
@JieZhang_ETH
Jie Zhang
28 days
3/ We made it work with ONLY text on GPT models and Claude models. The key insight: While LLMs are bad at giving absolute confidence scores ("I'm 73% confident"), they're surprisingly good at comparing options ("Option B is more likely than A").
1
0
1
@JieZhang_ETH
Jie Zhang
28 days
2/ Back in the day of "old" adversarial examples on vision models, this setting was called "decision-based" query attacks. Current LLM attacks hardly work in this setting… They either require white-box gradients/logits/log-prob, or rely on transferability or auxiliary models.
1
0
0
@florian_tramer
Florian Tramèr
1 month
5 years ago, I wrote a paper with @wielandbr @aleks_madry and Nicholas Carlini that showed that most published defenses in adversarial ML (for adversarial examples at the time) failed against properly designed attacks. Has anything changed? Nope...
5
28
184
@javirandor
Javier Rando
1 month
My first paper from @AnthropicAI! We show that the number of samples needed to backdoor an LLM stays constant as models scale.
@AnthropicAI
Anthropic
1 month
New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.
6
22
195
@JieZhang_ETH
Jie Zhang
2 months
Got all positive reviews but still rejected… how do they even pick which ‘all-positive’ papers to reject? 😂
@florian_tramer
Florian Tramèr
2 months
The whole experience with the @NeurIPSConf position paper track has just been one big 😂 Missed every deadline, only to now announce (a week after the original notification deadline) that they'll only accept ~6% of submissions. Should have just submitted to main track...
0
0
5
@Ali_TongyiLab
Tongyi Lab
2 months
1/7 We're launching Tongyi DeepResearch, the first fully open-source Web Agent to achieve performance on par with OpenAI's Deep Research with only 30B (Activated 3B) parameters! Tongyi DeepResearch agent demonstrates state-of-the-art results, scoring 32.9 on Humanity's Last Exam,
118
492
3K
@NKristina01_
Kristina Nikolić
4 months
Today we will present the RealMath benchmark poster at the AI for Math Workshop @icmlconf. ⏰ 10:50h - 12:20h📍West ballroom C Come if you want to chat about LLM's math capabilities for real-world tasks.
@JieZhang_ETH
Jie Zhang
6 months
1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).
0
2
12
@NKristina01_
Kristina Nikolić
4 months
We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are. See you Tuesday 11am at East #804. I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!
@NKristina01_
Kristina Nikolić
7 months
Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
1
9
47
@edoardo_debe
Edoardo Debenedetti
5 months
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: https://t.co/6muay8vPeC Code:
Tweet card summary image
github.com
Code for the paper "Defeating Prompt Injections by Design" - google-research/camel-prompt-injection
1
19
123
@dpaleka
Daniel Paleka
6 months
How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)
5
15
88
@XinCynthiaChen
Xin Chen, Cynthia
6 months
🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵
5
44
257
@JieZhang_ETH
Jie Zhang
6 months
6/ Try it out! Authors: @PetruiCezara, @NKristina01_ , @florian_tramer 🧪 Paper: https://t.co/rKwWXiVPzP 💻 Code: https://t.co/lzZcmWA803 📊 Data:
Tweet card summary image
huggingface.co
0
0
6
@JieZhang_ETH
Jie Zhang
6 months
5/ And the results? Surprisingly good! Current LLMs may already be useful mathematical assistants---not necessarily for deep proof synthesis, but for understanding, verifying, and retrieving relevant research-level statements.
1
0
7