
Florian Tramèr
@florian_tramer
Followers
6K
Following
2K
Media
100
Statuses
961
Assistant professor of computer science at ETH Zürich. Interested in Security, Privacy and Machine Learning
Zürich
Joined October 2019
Reference hallucinations still increased in July (data wasn't yet available when I posted early August), but have slightly decreased since then. Now that's what I call impact!. I also uploaded some code to reproduce, in case this is useful for anyone:
Are hallucinated references making it to arXiv?. Yes, definitely!. Since the release of Deep Research in February bogus references are on the rise (coincidence?). I wrote a blog post (link below) on my analysis (which hugely underestimates the true rate of hallucinations. )
0
3
20
RT @peterwildeford: "Agentic AI has been weaponized. AI models are now being used to perform sophisticated cyberattacks, not just advise on….
0
157
0
RT @lbeurerkellner: Oh wow, more AI malware (uses Claude Code to search for credentials). Is this the exponential takeoff moment people ke….
0
16
0
This LLM-powered malware seems to validate a bunch of the use-cases we had predicted a few months ago: (Which of course reviewers criticized as being impractical and unrealistic).
arxiv.org
We argue that Large language models (LLMs) will soon alter the economics of cyberattacks. Instead of attacking the most commonly used software and monetizing exploits by targeting the lowest...
#ESETResearch has discovered the first known AI-powered ransomware, which we named #PromptLock. The PromptLock malware uses the gpt-oss:20b model from OpenAI locally via the Ollama API to generate malicious Lua scripts on the fly, which it then executes 1/6
1
3
45
This is exactly what @javirandor did with chatgpt last year to get it to spit out memorized training data:. There's probably some interesting stuff to study on such "re-based" models.
spylab.ai
We introduce finetuning as an effective way to extract larger amounts of training data from production language models. This attack extracts 5x more training documents than our previous divergence...
OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only. or is it? . turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵
1
3
66
Are hallucinated references making it to arXiv?. Yes, definitely!. Since the release of Deep Research in February bogus references are on the rise (coincidence?). I wrote a blog post (link below) on my analysis (which hugely underestimates the true rate of hallucinations. )
9
27
286
I found a paper with this ref:.- the title is from: - the author list is from: - the link is - in the text ref [1] is for: How did this happen? Seems too weird for a LLM hallucination
5
7
59
RT @NKristina01_: We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are….
0
8
0
Very cool result. In hindsight, this shouldn't be too surprising to anyone who has ever taken a multiple choice exam. Eg if you have a trigonometry problem and the possible solutions are.A: 1.B: 3.7.C: -5.D: pi/2.which would you pick (with no knowledge of the question)?.
🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯. Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision
1
8
30
RT @ykilcher: 📢Paper Discussion Live📢.Come tonight to chat with us about: Design Patterns for Securing LLM Agents against Prompt Injections….
0
8
0
RT @edoardo_debe: We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most im….
github.com
Code for the paper "Defeating Prompt Injections by Design" - google-research/camel-prompt-injection
0
18
0
RT @j_dekoninck: Thrilled to share a major step forward for AI for mathematical proof generation! . We are releasing the Open Proof Corpus:….
0
23
0
RT @mvechev: Thrilled to share that Snyk (@snyksec), a leader in cybersecurity, has acquired our AI spin-off @InvariantLabsAI, a year after….
0
3
0
RT @InvariantLabsAI: We’re thrilled to officially join forces with @snyksec! . Together, we’re changing the landscape of the agentic AI fut….
0
2
0
RT @karpathy: RT to help Simon raise awareness of prompt injection attacks in LLMs. Feels a bit like the wild west of early computing, wit….
0
548
0
Simon wrote some very nice thoughts on our recent paper on design patterns for prompt injections. I've been following his writing on prompt injections since the start and his blog remains the best place to get an overview of the problem. I routinely recommend it to new students.
0
0
14
RT @simonw: Anyone building "agentic" systems on top of LLMs needs to take this principle into account every time they design or implement….
0
15
0
Thanks to @lbeurerkellner B. Busser @AnaMariaCretu5 @edoardo_debe D. Dobos D. Fabian @marc_r_fischer @DavidFroelicher @KathrinGrosse D. Naeff E. Ozoani @ajpaverd @vvolhejn . @ICepfl @CSatETH @InvariantLabsAI @IBM @Swisscom @Google @ETH_AI_Center @Microsoft @kyutai_labs AppliedAI.
0
1
2
We hope these design patterns can provide inspiration for safer agent implementations. We'd be very excited to hear about other design patterns people have tried out and had successes with.
1
0
0