
Apurv Verma
@verma_apurv5
Followers
473
Following
15K
Media
10
Statuses
511
Building safer, more aligned models 🧭 📐 • PhD student @NJIT 🎓 • NLP @Bloomberg 🛠️ • Prev: AS @AmazonScience • Alma: @GTCSE, @iitrpr
New York
Joined March 2011
🔥 Thrilled to share our new paper accepted at TMLR: "Operationalizing a Threat Model for Red-Teaming LLMs" .🎯 Goes beyond prompt jailbreaks to systematize ALL attack vectors across LLM lifecycle .🛡️ First comprehensive taxonomy based on entry points (training → deployment)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs). Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann et al. Action editor: Jinwoo Shin. #vulnerabilities #attacks #security.
1
1
8
I had the same question. Shouldn't we also be testing how well an LLM can abstain from proving a wrong theorem? Paper Idea: Corrupt existing theorems and see how often SOTA LLMs end up proving it. 💡.
One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it? 🧵.
0
0
1
💡 Ready to operationalize these insights? We've got you covered: .📜 ArXiv: .🎧 Deep dive podcast: 💻 Awesome resource hub: .✨ Full systematization of knowledge for H3LLM applications [🧵3/3].
github.com
Repository accompanying the paper https://openreview.net/pdf?id=sSAp8ITBpC - dapurv5/awesome-red-teaming-llms
0
0
1
RT @_onionesque: That watermarking should interfere with alignment in LLMs should be fairly obvious. Here we present experiments studying t….
0
2
0
RT @upperwal: Let’s not miss out on any Indian language!.This is a great opportunity to have your language represented in one of the larges….
0
3
0
Watermarking breaks AI alignment 🚨.Solution: generate multiple responses, pick the best one 🎯."Watermarking Degrades Alignment in Language Models" 📄 #AIResearch #AISafety.
arxiv.org
Watermarking techniques for large language models (LLMs) can significantly impact output quality, yet their effects on truthfulness, safety, and helpfulness remain critically underexamined. This...
0
1
8
RT @hhsun1: Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secu….
0
24
0
RT @_AndrewZhao: if submitting to @NeurIPSConf, DONT forget to add this at the END. Defend against AI reviewers & lost in the middle:. \tex….
0
66
0
This is an interesting take.
Writing and teaching are powerful ways to clarify your own thoughts. But, if done prematurely, they risk collapsing your evolving observations into neat arguments too soon. Premature verbalization can sometimes destroy delicate ideas. Both explicit & latent CoTs have a place.
0
0
0
RT @davidstutz92: Writing good reviews and rebuttals is a key skill in academic research. Unfortunately, this skill is rarely properly taug….
davidstutz.de
Writing and responding to reviews is the bread and butter of any academic and especially in AI research, PhD students are confronted with both rather early compared to other displicines. Unfortunat...
0
34
0
Just played around with this - it's almost too easy with a few tricks. Wondering if the examples gathered through this method could be termed genuine instances of persuasion rather than just manipulating the tradeoff between helpfulness and harmlessness in an LLM via a cleverly.
⚠️WARNING: offensive content ahead. Introducing RedTeam Arena with Bad Words—our first game. You've got 60 seconds to break the model to say the bad word. The faster, the better. (Collaboration with @elder_plinus and the awesome BASI 🐍 community.). Link to the site below👇
0
0
3
This looks super cool. Want to level up your game? Our paper is the ultimate resource to get you up to speed with LLM attack strategies. 📚🔥.Check it out here: #AI #MachineLearning #LLM #Security.
arxiv.org
Creating secure and resilient applications with large language models (LLM) requires anticipating, adjusting to, and countering unforeseen threats. Red-teaming has emerged as a critical technique...
🚨Ultimate Jailbreaking Championship 2024 🚨. Hackers vs. AI in the arena. Let the battle begin!. 🏆 $40,000 in Bounties.🗓️ Sept 7, 2024 @ 10AM PDT. 🔗Register Now:
0
0
1