Giorgio Piras @GiorgioPiras12 X Profile

Giorgio Piras

@GiorgioPiras12

Followers

73

Following

288

Media

13

Statuses

55

Postdoctoral Researcher @ University of Cagliari | AI Security

https://t.co/CiwFP8mRCi

Cagliari, Italia

Joined November 2020

Don't wanna be here? Send us removal request.

Machine Learning Security Laboratory

@mlsec_lab

14 days

We are excited to present a new event of our seminar series on ML Security! We will host @chwress (KIT) on November 26th, 2025, at 5 pm CET. Free registration at the link below: https://t.co/ygrJwTGY6W

0

1

Giorgio Piras

@GiorgioPiras12

5 days

Paper: https://t.co/t9QLf03GpZ Code: https://t.co/6IdkECieYY This work would not have been possible without our great @sAIferLab co-authors @FabioBrau, @biggiobattista, @LucaOneto, and @fabiogroli.

github.com

Contribute to pralab/som-refusal-directions development by creating an account on GitHub.

0

1

Giorgio Piras

@GiorgioPiras12

5 days

Finally, we analyze the directions and observe that they are all closely related in terms of cosine similarity. This addresses recent mechanistic insight positing that LLMs' concepts might be encoded with multiple related directions (i.e., non-orthogonal)

1

0

Giorgio Piras

@GiorgioPiras12

5 days

Mechanistically, we reveal that, as we steer with more directions, the cluster of harmful prompt representations: (i) gets compressed and (ii) moves closer to that of harmless prompts (we find a clear correlation between cluster compression and ASR!)

1

0

Giorgio Piras

@GiorgioPiras12

5 days

By steering LLMs with these directions, we find that we can outperform (in terms of attack success rate) not only single-direction baselines, but also specialized jailbreak attacks!

1

0

Giorgio Piras

@GiorgioPiras12

5 days

We address this by repurposing Self-Organizing Maps (SOMs) and approximating this manifold, uncovering multiple, closely related directions that encode refusal behavior. We show that SOMs effectively capture refusal, such as any other concept/manifold

1

0

Giorgio Piras

@GiorgioPiras12

5 days

Language models encode refusal as a manifold expressed through multiple, closely related directions. In our latest #AAAI26 paper, with @RaffaeleMura3, we bridge LLM safety and interpretability by answering the question: 👉 Do LLMs encode refusal behavior as a manifold?

1

2

Giorgio Piras

@GiorgioPiras12

13 days

Co-organized with @RaffaeleMura3, Fabio Brau, @maurapintor, @zangobot, and @biggiobattista powered by the @mlsec_lab at @sAIferLab

0

1

Giorgio Piras

@GiorgioPiras12

13 days

We welcome submissions on various topics, including Adversarial ML and AI for Cybersecurity. We'll also give a 𝘁𝘂𝘁𝗼𝗿𝗶𝗮𝗹 on Adversarial Machine Learning on the workshop day! Let’s discuss how to make AI more secure in the sunniest of Italian cities ☀️

1

0

1

Giorgio Piras

@GiorgioPiras12

13 days

🚨Workshop Alert! We're excited to announce the 2nd Trustworthy AI for Cybersecurity (TAIC26) workshop, co-located with ITASEC26. ⌛ Deadline: 5 December 2025 📢 CFP: https://t.co/tFpTNjUM43 ℹ Website: https://t.co/LXVSabELDt

1

2

Sec4AI4Sec

@sec4ai4sec

20 days

🎉 Great news from our partners at the University of Cagliari! Their latest research has been accepted for publication in the Machine Learning Journal (Springer Nature) — publication expected in early November. 🧠 The paper tackles a key challenge in AI-driven code

0

1

2

Machine Learning Security Laboratory

@mlsec_lab

24 days

Missed the event? Watch it again on our YouTube channel: https://t.co/cNQYG2lFXb Stay tuned for the following events! Thank you again, @XinCynthiaChen, for talking about your research at our seminar!

Machine Learning Security Laboratory

@mlsec_lab

1 month

We are excited to present a new event of our seminar series on ML Security! We will host @XinCynthiaChen (ETH Zurich) on October 22nd, 2025, at 5 pm CEST. Free registration at the link below: https://t.co/JsJQBiqL8Y

0

3

Giorgio Piras

@GiorgioPiras12

1 month

A super great work by @RaffaeleMura3. Go and have a look at our low-perplexity jailbreaks 👀

Raffaele Mura

@RaffaeleMura3

1 month

Our new paper, LatentBreak: Jailbreaking LLMs through Latent Space Feedback, is now on arXiv. We study how latent-space feedback can produce natural, low-perplexity jailbreaks. Joint work with brilliant colleagues across @sAIferLab @fdtn_ai https://t.co/rvgZaOCAym

0

1

Machine Learning Security Laboratory

@mlsec_lab

1 month

We are excited to present a new event of our seminar series on ML Security! We will host @XinCynthiaChen (ETH Zurich) on October 22nd, 2025, at 5 pm CEST. Free registration at the link below: https://t.co/JsJQBiqL8Y

0

4

7

Sec4AI4Sec

@sec4ai4sec

5 months

📢 New Publication in Pattern Recognition – Volume 168 We’re pleased to share that our partner, @Università degli Studi di Cagliari (Italy), has published important new research on adversarial pruning methods in Pattern Recognition! Recent years have seen the rise of pruning

0

3

Giorgio Piras

@GiorgioPiras12

5 months

Special thanks to all my @sAIferLab co-authors: @maurapintor, @ambrademontis, @biggiobattista, @GiorgioGiacinto, and Fabio Roli.

0

2

Giorgio Piras

@GiorgioPiras12

5 months

...and here's why instead why we should care: pruning alone simplifies boundaries, but preserving robustness demands more complexity, not less. This makes adversarial pruning uniquely challenging and intriguing, forcing us to rethink how we design sparse, secure models.

1

0

1

Giorgio Piras

@GiorgioPiras12

5 months

Here's a sneak peek of our interactive robustness curves:

1

0

1

Giorgio Piras

@GiorgioPiras12

5 months

Our benchmark enables you to: 📥 Download and test checkpoints of popular AP methods for your research 📉 Explore and compare them through interactive robustness curves 🤗 Submit your own novel/existing method to make it easily comparable and public! 🔗

github.com

Contribute to pralab/AdversarialPruningBenchmark development by creating an account on GitHub.

1

0

1

Giorgio Piras

@GiorgioPiras12

5 months

In this work, we: ✅ Propose a novel taxonomy of AP methods to clarify their design ✅ Build a uniform and reliable benchmark for evaluating adversarial robustness under pruning

1

0

1