LLM Security @llm_sec X Profile

LLM Security

@llm_sec

Followers

10K

Following

652

Media

253

Statuses

830

Research, papers, jobs, and news on large language model security. Got something relevant? DM / tag @llm_sec

🏔️

Joined April 2023

Don't wanna be here? Send us removal request.

LLM Security

@llm_sec

1 year

@elder_plinius attack surface ∝ capabilities.

2

1

16

LLM Security

@llm_sec

8 days

RT @hannahrosekirk: Listen up all talented early-stage researchers! 👂🤖. We're hiring for a 6-month residency in my team at @AISecurityInst….

0

34

0

Grok

@grok

6 days

Join millions who have switched to Grok.

250

500

4K

LLM Security

@llm_sec

29 days

Senior Security Architect - AI and ML @ NVIDIA.

0

9

LLM Security

@llm_sec

1 month

RT @LeonDerczynski: LLMSEC proceedings are up!.(Anthology is processing). #ACL2025NLP.

0

6

0

LLM Security

@llm_sec

1 month

RT @LeonDerczynski: At ACL in Vienna? Hear the world's leading prompt injector talk at LLMSEC on Friday! . Johann Rehberger @wunderwuzzi23….

0

3

0

LLM Security

@llm_sec

1 month

RT @LeonDerczynski: Come to LLMSEC at ACL & hear Niloofar's keynote. "What does it mean for agentic AI to preserve privacy?" - @niloofar_mi….

0

3

0

LLM Security

@llm_sec

1 month

RT @LeonDerczynski: First keynote at LLMSEC 2025, ACL:. "A Bunch of Garbage and Hoping: LLMs, Agentic Security, and Where We Go From Here"….

0

4

0

LLM Security

@llm_sec

5 months

RT @LeonDerczynski: Call for papers: LLMSEC 2025. Deadline 15 April, held w/ ACL 2025 in Vienna. Formats: long/short/war stories. More: >>….

sig.llmsecurity.net

The first ACL Workshop on LLM and NLP Security; Summer 2025, Vienna, Austria

0

4

0

LLM Security

@llm_sec

9 months

Gritty Pixy. "We leverage the sensitivity of existing QR code readers and stretch them to their detection limit. This is not difficult to craft very elaborated prompts and to inject them into QR codes. What is difficult is to make them inconspicuous as we do here with Gritty

1

3

29

LLM Security

@llm_sec

10 months

RT @garak_llm: garak has moved to NVIDIA!. New repo link:

github.com

the LLM vulnerability scanner. Contribute to NVIDIA/garak development by creating an account on GitHub.

0

39

0

LLM Security

@llm_sec

10 months

ChatTL;DR – You Really Ought to Check What the LLM Said on Your Behalf 🌶️. "assuming that in the near term it’s just not machines talking to machines all the way down, how do we get people to check the output of LLMs before they copy and paste it to friends, colleagues, course

0

1

11

LLM Security

@llm_sec

10 months

Automated Red Teaming with GOAT: the Generative Offensive Agent Tester. "we introduce the Generative Offensive Agent Tester (GOAT), an automated agentic red teaming system that simulates plain language adversarial conversations while leveraging multiple adversarial prompting.

0

4

35

LLM Security

@llm_sec

10 months

LLMmap: Fingerprinting For Large Language Models. "With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM

0

26

95

LLM Security

@llm_sec

10 months

RT @llm_sec: Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis 🌶️. "Our study evaluates prominent….

0

11

0

LLM Security

@llm_sec

10 months

author thread for cognitive overload attack:

Bibek

@upadhayay_bibek

11 months

1. 🔍What do humans and LLMs have in common?. They both struggle with cognitive overload! 🤯 .In our latest study, we dive deep into In-Context Learning (ICL) and uncover surprising parallels between human cognition and LLM behavior. @aminkarbasi @vbehzadan.2. 🧠 Cognitive Load

0

3

LLM Security

@llm_sec

10 months

Cognitive Overload Attack: Prompt Injection for Long Context. "We applied the principles of Cognitive Load Theory in LLMs. We show that advanced models such as GPT-4, Claude-3.5 Sonnet, Claude-3 OPUS, Llama-3-70B-Instruct, Gemini-1.0-Pro, and Gemini-1.5-Pro can be successfully

1

9

35

LLM Security

@llm_sec

10 months

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models. (-- look at that perf/latency pareto frontier. game on!). "State-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). We propose

2

5

35

LLM Security

@llm_sec

10 months

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents. "To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. We find (1) leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking, (2) simple universal

4

26

82

LLM Security

@llm_sec

10 months

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge. "This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information."."for unlearning methods with utility constraints, the

3

35

168

LLM Security

@llm_sec

10 months

RT @NannaInie: unpopular opinion: maybe let insecure be insecure and worry about the downstream effects on end users instead of protecting….

0

2

0

LLM Security

@llm_sec

10 months

RT @_Sizhe_Chen_: Safety comes first to deploying LLMs in applications like agents. For richer opportunities of LLMs, we mitigate prompt in….

0

13

0