
Paul Kassianik
@kass_paul
Followers
199
Following
1K
Media
6
Statuses
147
AI researcher @fdtn_ai @Cisco. Formerly @robusthq @SFResearch. Researching AI Security.
SF Bay Area
Joined October 2020
🚀 Exciting to present "Tree of Attacks: Jailbreaking Black-Box LLMs Automatically" 📑🌳 Unlock the power of automated jailbreaks using Tree of Attacks with Pruning (TAP).
arxiv.org
While Large Language Models (LLMs) display versatile functionality, they continue to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human-designed jailbreaks. In...
3
9
25
RT @natolambert: Is it "bad" that everyone is distilling from / training on Chinese models? While not directly bad, there is a large soft p….
0
7
0
RT @fdtn_ai: - We will be presenting "Adversarial Reasoning at Jailbreaking Time" on Wednesday morning, Poster E-805: .
0
1
0
RT @fdtn_ai: Foundation AI is coming to #ICML! Catch us at any of these events:. - Our Chief Scientist @aminkarbasi will be hosting a tutor….
0
1
0
RT @dlwh: So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public….
0
102
0
"We don't have time for proper science, we have to beat other labs on LiveCodeBench". So true -- "number go up" research might be a flag that the victim of the Bitter Lesson is . you! @finbarrtimbers
artfintel.com
Far too many people misunderstand the bitter lesson
1
0
2
It might very well be that there is no moat in small models, especially domain specific ones unless they are a stepping stone to something much more fundamental.
McKinsey's new report on AI agents shows the same mindset I see in many firms: a focus on making small, obsolete models do basic work (look at their suggested models!) rather than realizing that smarter models can do higher-end work (and those models are getting cheaper & better)
0
0
2
RT @emollick: McKinsey's new report on AI agents shows the same mindset I see in many firms: a focus on making small, obsolete models do ba….
0
229
0
RT @maksym_andr: Check out our new paper on monitoring decomposition jailbreak attacks!. Monitoring is (still) an underappreciated research….
0
5
0
RT @chargoddard: 🤯 MIND-BLOWN! A new paper just SHATTERED everything we thought we knew about AI reasoning!. This is paradigm-shifting. A M….
0
245
0
Super excited to collaborate on this work with the amazing @kotekjedi_ml , @maksym_andr , and @jonasgeiping ! . We can now confidently say that automatic red teaming methods are great harnesses for attacks - LLMs just need to be capable enough to use them!.
Stronger models need stronger attackers! 🤖⚔️.In our new paper we explore how attacker-target capability dynamics affect red-teaming success (ASR). Key insights:.🔸Stronger models = better attackers.🔸ASR depends on capability gap.🔸Psychology >> STEM for ASR. More in 🧵👇
0
1
6
RT @natolambert: immortalizing this moment forever when RL is so easy that you can just use random rewards and your benchmarks still go up….
0
58
0
Pretty sure of all the guides on PPO/GRPO I've seen out there, this is the most simple, straightforward, and accesible one by @YugeTen .
yugeten.github.io
0
0
1
RT @rohanpaul_ai: Automated detection of LLM hallucinations using only correct examples is fundamentally difficult. This paper shows detec….
0
51
0
RT @aminkarbasi: We just released: Foundation-Sec-8B, an open-weight, 8-billion-parameter base model purpose-built for cybersecurity. Wh….
0
5
0
RT @Cisco: "While security has been blamed for slowing technology adoption in the past, we believe that taking the right approach to safety….
0
8
0
I have a theory that the AI hype is simply due to a collective addiction of humanity's top talent to loss curve dopamine.
0
0
1
What kind of backdoor is this? @allen_ai . how did this make it into the final dolma PII tagger?. Permalink:
0
0
1