Florian Tramèr Profile
Florian Tramèr

@florian_tramer

Followers
6K
Following
2K
Media
100
Statuses
961

Assistant professor of computer science at ETH Zürich. Interested in Security, Privacy and Machine Learning

Zürich
Joined October 2019
Don't wanna be here? Send us removal request.
@florian_tramer
Florian Tramèr
13 hours
Reference hallucinations still increased in July (data wasn't yet available when I posted early August), but have slightly decreased since then. Now that's what I call impact!. I also uploaded some code to reproduce, in case this is useful for anyone:
Tweet media one
@florian_tramer
Florian Tramèr
1 month
Are hallucinated references making it to arXiv?. Yes, definitely!. Since the release of Deep Research in February bogus references are on the rise (coincidence?). I wrote a blog post (link below) on my analysis (which hugely underestimates the true rate of hallucinations. )
Tweet media one
0
3
20
@florian_tramer
Florian Tramèr
10 days
RT @peterwildeford: "Agentic AI has been weaponized. AI models are now being used to perform sophisticated cyberattacks, not just advise on….
0
157
0
@florian_tramer
Florian Tramèr
10 days
RT @lbeurerkellner: Oh wow, more AI malware (uses Claude Code to search for credentials). Is this the exponential takeoff moment people ke….
0
16
0
@florian_tramer
Florian Tramèr
11 days
This LLM-powered malware seems to validate a bunch of the use-cases we had predicted a few months ago: (Which of course reviewers criticized as being impractical and unrealistic).
Tweet card summary image
arxiv.org
We argue that Large language models (LLMs) will soon alter the economics of cyberattacks. Instead of attacking the most commonly used software and monetizing exploits by targeting the lowest...
@ESETresearch
ESET Research
11 days
#ESETResearch has discovered the first known AI-powered ransomware, which we named #PromptLock. The PromptLock malware uses the gpt-oss:20b model from OpenAI locally via the Ollama API to generate malicious Lua scripts on the fly, which it then executes 1/6
Tweet media one
1
3
45
@florian_tramer
Florian Tramèr
24 days
This is exactly what @javirandor did with chatgpt last year to get it to spit out memorized training data:. There's probably some interesting stuff to study on such "re-based" models.
spylab.ai
We introduce finetuning as an effective way to extract larger amounts of training data from production language models. This attack extracts 5x more training documents than our previous divergence...
@jxmnop
Jack Morris
24 days
OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only. or is it? . turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵
Tweet media one
Tweet media two
1
3
66
@florian_tramer
Florian Tramèr
1 month
Are hallucinated references making it to arXiv?. Yes, definitely!. Since the release of Deep Research in February bogus references are on the rise (coincidence?). I wrote a blog post (link below) on my analysis (which hugely underestimates the true rate of hallucinations. )
Tweet media one
9
27
286
@florian_tramer
Florian Tramèr
2 months
I found a paper with this ref:.- the title is from: - the author list is from: - the link is - in the text ref [1] is for: How did this happen? Seems too weird for a LLM hallucination
Tweet media one
5
7
59
@florian_tramer
Florian Tramèr
2 months
RT @NKristina01_: We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are….
0
8
0
@florian_tramer
Florian Tramèr
2 months
Very cool result. In hindsight, this shouldn't be too surprising to anyone who has ever taken a multiple choice exam. Eg if you have a trigonometry problem and the possible solutions are.A: 1.B: 3.7.C: -5.D: pi/2.which would you pick (with no knowledge of the question)?.
@nikhilchandak29
Nikhil Chandak
2 months
🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯. Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision
Tweet media one
1
8
30
@florian_tramer
Florian Tramèr
2 months
RT @ykilcher: 📢Paper Discussion Live📢.Come tonight to chat with us about: Design Patterns for Securing LLM Agents against Prompt Injections….
0
8
0
@florian_tramer
Florian Tramèr
2 months
RT @edoardo_debe: We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most im….
Tweet card summary image
github.com
Code for the paper "Defeating Prompt Injections by Design" - google-research/camel-prompt-injection
0
18
0
@florian_tramer
Florian Tramèr
2 months
RT @j_dekoninck: Thrilled to share a major step forward for AI for mathematical proof generation! . We are releasing the Open Proof Corpus:….
0
23
0
@florian_tramer
Florian Tramèr
2 months
RT @mvechev: Thrilled to share that Snyk (@snyksec), a leader in cybersecurity, has acquired our AI spin-off @InvariantLabsAI, a year after….
0
3
0
@florian_tramer
Florian Tramèr
2 months
RT @InvariantLabsAI: We’re thrilled to officially join forces with @snyksec! . Together, we’re changing the landscape of the agentic AI fut….
0
2
0
@florian_tramer
Florian Tramèr
3 months
RT @karpathy: RT to help Simon raise awareness of prompt injection attacks in LLMs. Feels a bit like the wild west of early computing, wit….
0
548
0
@florian_tramer
Florian Tramèr
3 months
Simon wrote some very nice thoughts on our recent paper on design patterns for prompt injections. I've been following his writing on prompt injections since the start and his blog remains the best place to get an overview of the problem. I routinely recommend it to new students.
@simonw
Simon Willison
3 months
Here are my extensive notes on the paper
0
0
14
@florian_tramer
Florian Tramèr
3 months
RT @simonw: Anyone building "agentic" systems on top of LLMs needs to take this principle into account every time they design or implement….
0
15
0
@florian_tramer
Florian Tramèr
3 months
We hope these design patterns can provide inspiration for safer agent implementations. We'd be very excited to hear about other design patterns people have tried out and had successes with.
1
0
0