verma_apurv5 Profile Banner
Apurv Verma Profile
Apurv Verma

@verma_apurv5

Followers
473
Following
15K
Media
10
Statuses
511

Building safer, more aligned models 🧭 📐 • PhD student @NJIT 🎓 • NLP @Bloomberg 🛠️ • Prev: AS @AmazonScience • Alma: @GTCSE, @iitrpr

New York
Joined March 2011
Don't wanna be here? Send us removal request.
@verma_apurv5
Apurv Verma
12 days
🔥 Thrilled to share our new paper accepted at TMLR: "Operationalizing a Threat Model for Red-Teaming LLMs" .🎯 Goes beyond prompt jailbreaks to systematize ALL attack vectors across LLM lifecycle .🛡️ First comprehensive taxonomy based on entry points (training → deployment)
Tweet media one
@TmlrPub
Accepted papers at TMLR
2 months
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs). Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann et al. Action editor: Jinwoo Shin. #vulnerabilities #attacks #security.
1
1
8
@verma_apurv5
Apurv Verma
10 hours
I had the same question. Shouldn't we also be testing how well an LLM can abstain from proving a wrong theorem? Paper Idea: Corrupt existing theorems and see how often SOTA LLMs end up proving it. 💡.
@littmath
Daniel Litt
1 day
One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it? 🧵.
0
0
1
@verma_apurv5
Apurv Verma
9 days
🧵[3/3] Given how heavily the enterprise is investing in filter-based approaches, this leaves me concerned if our current bets on guardrails are actually aligned with the deeper computational realities this paper suggests. Overall, it's an intriguing and somewhat unsettling point.
0
0
0
@verma_apurv5
Apurv Verma
9 days
🧵[2/3] But I'm not fully clear how this intuition translates into the strong computational claim of fundamental impossibility. It feels like there's an implicit analogy to complexity theory here, but the connection seems subtle and not fully spelled out. Why should.
1
0
0
@verma_apurv5
Apurv Verma
9 days
🧵[1/3] Came across this provocative paper arguing for the impossibility of reliably aligning LLMs via external filters and guardrails. The core idea seems to be: there's a fundamental "efficiency gap" between increasingly sophisticated models and simpler
Tweet media one
1
0
0
@verma_apurv5
Apurv Verma
10 days
🔍🚨 A quick post. Recent findings suggest quantized models are surprisingly more prone to jailbreak and fault injection attacks. Based on our earlier threat modeling framework for LLM red-teaming, I would characterize it as a novel side-channel vulnerability in LLMs:.
0
0
1
@verma_apurv5
Apurv Verma
12 days
💡 Ready to operationalize these insights? We've got you covered: .📜 ArXiv: .🎧 Deep dive podcast: 💻 Awesome resource hub: .✨ Full systematization of knowledge for H3LLM applications [🧵3/3].
Tweet card summary image
github.com
Repository accompanying the paper https://openreview.net/pdf?id=sSAp8ITBpC - dapurv5/awesome-red-teaming-llms
0
0
1
@verma_apurv5
Apurv Verma
12 days
🚀 Key insights that'll change how you think about LLM security: .🔍 Side channels & model inversion attacks are vastly underexplored vs prompt jailbreaks .🤝 Manual + automated red-teaming = superior vulnerability discovery .🎯 Domain-specific risk taxonomies >.
1
0
1
@verma_apurv5
Apurv Verma
1 month
RT @_onionesque: That watermarking should interfere with alignment in LLMs should be fairly obvious. Here we present experiments studying t….
0
2
0
@verma_apurv5
Apurv Verma
1 month
RT @upperwal: Let’s not miss out on any Indian language!.This is a great opportunity to have your language represented in one of the larges….
0
3
0
@verma_apurv5
Apurv Verma
2 months
Watermarking breaks AI alignment 🚨.Solution: generate multiple responses, pick the best one 🎯."Watermarking Degrades Alignment in Language Models" 📄 #AIResearch #AISafety.
Tweet card summary image
arxiv.org
Watermarking techniques for large language models (LLMs) can significantly impact output quality, yet their effects on truthfulness, safety, and helpfulness remain critically underexamined. This...
0
1
8
@verma_apurv5
Apurv Verma
2 months
RT @hhsun1: Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secu….
0
24
0
@verma_apurv5
Apurv Verma
2 months
RT @_AndrewZhao: if submitting to @NeurIPSConf, DONT forget to add this at the END. Defend against AI reviewers & lost in the middle:. \tex….
0
66
0
@verma_apurv5
Apurv Verma
4 months
Shout-out to Hinton for saying it out loud.
@vitrupo
vitrupo
4 months
Geoffrey Hinton says RLHF is a pile of crap. He likens it to a paint job for a rusty car you want to sell.
0
0
1
@verma_apurv5
Apurv Verma
5 months
When I was in undergrad a lot of people did a lot of competitive programming. Leetcode was non-existent back then but there was a certain thrill associated with -in the words of Donald Knuth "eke out the last bit of performance" from a machine. I wonder, how will the culture of.
0
0
1
@verma_apurv5
Apurv Verma
6 months
This is an interesting take.
@lateinteraction
Omar Khattab
6 months
Writing and teaching are powerful ways to clarify your own thoughts. But, if done prematurely, they risk collapsing your evolving observations into neat arguments too soon. Premature verbalization can sometimes destroy delicate ideas. Both explicit & latent CoTs have a place.
0
0
0
@verma_apurv5
Apurv Verma
10 months
Just played around with this - it's almost too easy with a few tricks. Wondering if the examples gathered through this method could be termed genuine instances of persuasion rather than just manipulating the tradeoff between helpfulness and harmlessness in an LLM via a cleverly.
@lmarena_ai
lmarena.ai
11 months
⚠️WARNING: offensive content ahead. Introducing RedTeam Arena with Bad Words—our first game. You've got 60 seconds to break the model to say the bad word. The faster, the better. (Collaboration with @elder_plinus and the awesome BASI 🐍 community.). Link to the site below👇
0
0
3
@verma_apurv5
Apurv Verma
11 months
This looks super cool. Want to level up your game? Our paper is the ultimate resource to get you up to speed with LLM attack strategies. 📚🔥.Check it out here: #AI #MachineLearning #LLM #Security.
Tweet card summary image
arxiv.org
Creating secure and resilient applications with large language models (LLM) requires anticipating, adjusting to, and countering unforeseen threats. Red-teaming has emerged as a critical technique...
@GraySwanAI
Gray Swan AI
11 months
🚨Ultimate Jailbreaking Championship 2024 🚨. Hackers vs. AI in the arena. Let the battle begin!. 🏆 $40,000 in Bounties.🗓️ Sept 7, 2024 @ 10AM PDT. 🔗Register Now:
Tweet media one
0
0
1