GiorgioPiras12 Profile Banner
Giorgio Piras Profile
Giorgio Piras

@GiorgioPiras12

Followers
73
Following
288
Media
13
Statuses
55

Postdoctoral Researcher @ University of Cagliari | AI Security

Cagliari, Italia
Joined November 2020
Don't wanna be here? Send us removal request.
@mlsec_lab
Machine Learning Security Laboratory
14 days
We are excited to present a new event of our seminar series on ML Security! We will host @chwress (KIT) on November 26th, 2025, at 5 pm CET. Free registration at the link below: https://t.co/ygrJwTGY6W
0
1
1
@GiorgioPiras12
Giorgio Piras
5 days
Finally, we analyze the directions and observe that they are all closely related in terms of cosine similarity. This addresses recent mechanistic insight positing that LLMs' concepts might be encoded with multiple related directions (i.e., non-orthogonal)
1
0
0
@GiorgioPiras12
Giorgio Piras
5 days
Mechanistically, we reveal that, as we steer with more directions, the cluster of harmful prompt representations: (i) gets compressed and (ii) moves closer to that of harmless prompts (we find a clear correlation between cluster compression and ASR!)
1
0
0
@GiorgioPiras12
Giorgio Piras
5 days
By steering LLMs with these directions, we find that we can outperform (in terms of attack success rate) not only single-direction baselines, but also specialized jailbreak attacks!
1
0
0
@GiorgioPiras12
Giorgio Piras
5 days
We address this by repurposing Self-Organizing Maps (SOMs) and approximating this manifold, uncovering multiple, closely related directions that encode refusal behavior. We show that SOMs effectively capture refusal, such as any other concept/manifold
1
0
0
@GiorgioPiras12
Giorgio Piras
5 days
Language models encode refusal as a manifold expressed through multiple, closely related directions. In our latest #AAAI26 paper, with @RaffaeleMura3, we bridge LLM safety and interpretability by answering the question: ๐Ÿ‘‰ Do LLMs encode refusal behavior as a manifold?
1
2
2
@GiorgioPiras12
Giorgio Piras
13 days
Co-organized with @RaffaeleMura3, Fabio Brau, @maurapintor, @zangobot, and @biggiobattista powered by the @mlsec_lab at @sAIferLab
0
0
1
@GiorgioPiras12
Giorgio Piras
13 days
We welcome submissions on various topics, including Adversarial ML and AI for Cybersecurity. We'll also give a ๐˜๐˜‚๐˜๐—ผ๐—ฟ๐—ถ๐—ฎ๐—น on Adversarial Machine Learning on the workshop day! Letโ€™s discuss how to make AI more secure in the sunniest of Italian cities โ˜€๏ธ
1
0
1
@GiorgioPiras12
Giorgio Piras
13 days
๐ŸšจWorkshop Alert! We're excited to announce the 2nd Trustworthy AI for Cybersecurity (TAIC26) workshop, co-located with ITASEC26. โŒ› Deadline: 5 December 2025 ๐Ÿ“ข CFP: https://t.co/tFpTNjUM43 โ„น Website: https://t.co/LXVSabELDt
1
1
2
@sec4ai4sec
Sec4AI4Sec
20 days
๐ŸŽ‰ Great news from our partners at the University of Cagliari! Their latest research has been accepted for publication in the Machine Learning Journal (Springer Nature) โ€” publication expected in early November. ๐Ÿง  The paper tackles a key challenge in AI-driven code
0
1
2
@mlsec_lab
Machine Learning Security Laboratory
24 days
Missed the event? Watch it again on our YouTube channel: https://t.co/cNQYG2lFXb Stay tuned for the following events! Thank you again, @XinCynthiaChen, for talking about your research at our seminar!
@mlsec_lab
Machine Learning Security Laboratory
1 month
We are excited to present a new event of our seminar series on ML Security! We will host @XinCynthiaChen (ETH Zurich) on October 22nd, 2025, at 5 pm CEST. Free registration at the link below: https://t.co/JsJQBiqL8Y
0
3
3
@GiorgioPiras12
Giorgio Piras
1 month
A super great work by @RaffaeleMura3. Go and have a look at our low-perplexity jailbreaks ๐Ÿ‘€
@RaffaeleMura3
Raffaele Mura
1 month
Our new paper, LatentBreak: Jailbreaking LLMs through Latent Space Feedback, is now on arXiv. We study how latent-space feedback can produce natural, low-perplexity jailbreaks. Joint work with brilliant colleagues across @sAIferLab @fdtn_ai https://t.co/rvgZaOCAym
0
0
1
@mlsec_lab
Machine Learning Security Laboratory
1 month
We are excited to present a new event of our seminar series on ML Security! We will host @XinCynthiaChen (ETH Zurich) on October 22nd, 2025, at 5 pm CEST. Free registration at the link below: https://t.co/JsJQBiqL8Y
0
4
7
@sec4ai4sec
Sec4AI4Sec
5 months
๐Ÿ“ข New Publication in Pattern Recognition โ€“ Volume 168 Weโ€™re pleased to share that our partner, @Universitร  degli Studi di Cagliari (Italy), has published important new research on adversarial pruning methods in Pattern Recognition! Recent years have seen the rise of pruning
0
3
3
@GiorgioPiras12
Giorgio Piras
5 months
Special thanks to all my @sAIferLab co-authors: @maurapintor, @ambrademontis, @biggiobattista, @GiorgioGiacinto, and Fabio Roli.
0
0
2
@GiorgioPiras12
Giorgio Piras
5 months
...and here's why instead why we should care: pruning alone simplifies boundaries, but preserving robustnessย demands more complexity, not less. This makes adversarial pruning uniquely challenging and intriguing, forcing us to rethink how we design sparse, secure models.
1
0
1
@GiorgioPiras12
Giorgio Piras
5 months
Here's a sneak peek of our interactive robustness curves:
1
0
1
@GiorgioPiras12
Giorgio Piras
5 months
Our benchmark enables you to: ๐Ÿ“ฅ Download and test checkpoints of popular AP methods for your research ๐Ÿ“‰ Explore and compare them through interactive robustness curves ๐Ÿค— Submit your own novel/existing method to make it easily comparable and public! ๐Ÿ”—
Tweet card summary image
github.com
Contribute to pralab/AdversarialPruningBenchmark development by creating an account on GitHub.
1
0
1
@GiorgioPiras12
Giorgio Piras
5 months
In this work, we: โœ… Propose a novel taxonomy of AP methods to clarify their design โœ… Build a uniform and reliable benchmark for evaluating adversarial robustness under pruning
1
0
1