Giorgio Piras
@GiorgioPiras12
Followers
73
Following
288
Media
13
Statuses
55
Postdoctoral Researcher @ University of Cagliari | AI Security
Cagliari, Italia
Joined November 2020
We are excited to present a new event of our seminar series on ML Security! We will host @chwress (KIT) on November 26th, 2025, at 5 pm CET. Free registration at the link below: https://t.co/ygrJwTGY6W
0
1
1
Paper: https://t.co/t9QLf03GpZ Code: https://t.co/6IdkECieYY This work would not have been possible without our great @sAIferLab co-authors @FabioBrau, @biggiobattista, @LucaOneto, and @fabiogroli.
github.com
Contribute to pralab/som-refusal-directions development by creating an account on GitHub.
0
1
1
Finally, we analyze the directions and observe that they are all closely related in terms of cosine similarity. This addresses recent mechanistic insight positing that LLMs' concepts might be encoded with multiple related directions (i.e., non-orthogonal)
1
0
0
Mechanistically, we reveal that, as we steer with more directions, the cluster of harmful prompt representations: (i) gets compressed and (ii) moves closer to that of harmless prompts (we find a clear correlation between cluster compression and ASR!)
1
0
0
By steering LLMs with these directions, we find that we can outperform (in terms of attack success rate) not only single-direction baselines, but also specialized jailbreak attacks!
1
0
0
We address this by repurposing Self-Organizing Maps (SOMs) and approximating this manifold, uncovering multiple, closely related directions that encode refusal behavior. We show that SOMs effectively capture refusal, such as any other concept/manifold
1
0
0
Language models encode refusal as a manifold expressed through multiple, closely related directions. In our latest #AAAI26 paper, with @RaffaeleMura3, we bridge LLM safety and interpretability by answering the question: ๐ Do LLMs encode refusal behavior as a manifold?
1
2
2
Co-organized with @RaffaeleMura3, Fabio Brau, @maurapintor, @zangobot, and @biggiobattista powered by the @mlsec_lab at @sAIferLab
0
0
1
We welcome submissions on various topics, including Adversarial ML and AI for Cybersecurity. We'll also give a ๐๐๐๐ผ๐ฟ๐ถ๐ฎ๐น on Adversarial Machine Learning on the workshop day! Letโs discuss how to make AI more secure in the sunniest of Italian cities โ๏ธ
1
0
1
๐จWorkshop Alert! We're excited to announce the 2nd Trustworthy AI for Cybersecurity (TAIC26) workshop, co-located with ITASEC26. โ Deadline: 5 December 2025 ๐ข CFP: https://t.co/tFpTNjUM43 โน Website: https://t.co/LXVSabELDt
1
1
2
๐ Great news from our partners at the University of Cagliari! Their latest research has been accepted for publication in the Machine Learning Journal (Springer Nature) โ publication expected in early November. ๐ง The paper tackles a key challenge in AI-driven code
0
1
2
Missed the event? Watch it again on our YouTube channel: https://t.co/cNQYG2lFXb Stay tuned for the following events! Thank you again, @XinCynthiaChen, for talking about your research at our seminar!
We are excited to present a new event of our seminar series on ML Security! We will host @XinCynthiaChen (ETH Zurich) on October 22nd, 2025, at 5 pm CEST. Free registration at the link below: https://t.co/JsJQBiqL8Y
0
3
3
A super great work by @RaffaeleMura3. Go and have a look at our low-perplexity jailbreaks ๐
Our new paper, LatentBreak: Jailbreaking LLMs through Latent Space Feedback, is now on arXiv. We study how latent-space feedback can produce natural, low-perplexity jailbreaks. Joint work with brilliant colleagues across @sAIferLab @fdtn_ai
https://t.co/rvgZaOCAym
0
0
1
We are excited to present a new event of our seminar series on ML Security! We will host @XinCynthiaChen (ETH Zurich) on October 22nd, 2025, at 5 pm CEST. Free registration at the link below: https://t.co/JsJQBiqL8Y
0
4
7
๐ข New Publication in Pattern Recognition โ Volume 168 Weโre pleased to share that our partner, @Universitร degli Studi di Cagliari (Italy), has published important new research on adversarial pruning methods in Pattern Recognition! Recent years have seen the rise of pruning
0
3
3
Special thanks to all my @sAIferLab co-authors: @maurapintor, @ambrademontis, @biggiobattista, @GiorgioGiacinto, and Fabio Roli.
0
0
2
...and here's why instead why we should care: pruning alone simplifies boundaries, but preserving robustnessย demands more complexity, not less. This makes adversarial pruning uniquely challenging and intriguing, forcing us to rethink how we design sparse, secure models.
1
0
1
Here's a sneak peek of our interactive robustness curves:
1
0
1
Our benchmark enables you to: ๐ฅ Download and test checkpoints of popular AP methods for your research ๐ Explore and compare them through interactive robustness curves ๐ค Submit your own novel/existing method to make it easily comparable and public! ๐
github.com
Contribute to pralab/AdversarialPruningBenchmark development by creating an account on GitHub.
1
0
1
In this work, we: โ
Propose a novel taxonomy of AP methods to clarify their design โ
Build a uniform and reliable benchmark for evaluating adversarial robustness under pruning
1
0
1