Nicola Cancedda @nicola_cancedda X Profile

Nicola Cancedda

@nicola_cancedda

Followers

497

Following

208

Media

8

Statuses

51

Research Scientist Manager @MetaAI

London, England

Joined January 2009

Don't wanna be here? Send us removal request.

Nicola Cancedda

@nicola_cancedda

4 days

Honored to be featured on VentureBeat! Meta researchers open the LLM black box to repair flawed AI reasoning https://t.co/r3HmGSufd0 via @VentureBeat

venturebeat.com

The proof-of-concept could pave the way for a new class of AI debuggers, making language models more reliable for business-critical applications.

0

3

Nicola Cancedda

@nicola_cancedda

10 days

Thank you @michaelwhanna and the rest of the circuit-tracer authors: this work would have taken twice as long without it!

0

6

Nicola Cancedda

@nicola_cancedda

10 days

Very proud of this work! We are making nice progress towards LLM debugging using mechanistic interpretability tools. Check it out!

Zheng Zhao

@zhengzhao97

10 days

Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR @metaai. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.

3

11

97

Nicola Cancedda

@nicola_cancedda

20 days

In LLMs different layers do different things. Reasoning capabilities can be improved by finding the right layers and teaching the model to loop over them. Great work @yeskendir_k !

Yeskendir 🇰🇿

@yeskendir_k

20 days

🧵 New paper from FAIR (Meta) on recursion + latent reasoning: "Encode,Think,Decode (ETD): Scaling reasoning through recursive latent thoughts". ETD improves the reasoning of base model by training it to iterate over a subset of reasoning-critical layers during mid-training.(1/n)

1

6

64

Sonia

@soniajoseph_

5 months

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉 We’ll be in Nashville next week. Come say hi 👋 @CVPR @miv_cvpr2025

3

29

272

Virginie Do

@gini_do

6 months

I am at #ICLR and honored to present this work on Saturday afternoon at the poster session. Thanks @jade_lei_yu @mahnerak @nicola_cancedda for this wonderful collaboration! I am also happy to chat about Llama / agents / safety 👋

Lei Yu

@jade_lei_yu

1 year

New paper! 🎊 We are delighted to announce our new paper "Robust LLM Safeguarding via Refusal Feature Adversarial Training"! There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer!

0

5

28

Nicola Cancedda

@nicola_cancedda

6 months

We can steer LLMs to 'sound' less confident or altogether abstain when they are more likely to hallucinate. Check this out!

Ziwei Ji

@ZiweiJi184538

6 months

🚀Excited to share our work from #Meta #FAIR: Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations! We found "Verbal Uncertainty" is governed by a linear feature and used this insight to detect and mitigate hallucinations. 🔗 https://t.co/JcandVIfrx

0

Nicola Cancedda

@nicola_cancedda

1 year

I would like to thank the many who are applying for this position. My apologies to those who are sending me emails: I will not be able to reply to you, you will hear back from one of Meta's recruiters in due course.

0

4

Diego Garcia-Olano

@dgolano

1 year

Hi all, I'll be hosting a Research Scientist intern for 2025 with a focus on exploring LLM safety alignment and understanding potentially using explainability methods. If that sounds of interest apply/reach out! https://t.co/HjBZrLlHGR

7

37

210

Nicola Cancedda

@nicola_cancedda

1 year

A condition is that you are a student enrolled in a PhD program, and still will be at the time of the internship. There are also other announcements for RS internships on other topics on the Meta careers site as well.

2

0

7

Nicola Cancedda

@nicola_cancedda

1 year

I am looking for a Research Scientist intern for 2025. If you have already published work that involves understanding behaviours of AI models looking at their parameters and activations, I would like to hear from you.

5

50

327

Nicola Cancedda

@nicola_cancedda

1 year

There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer! Check out our new work! With @JadeLeiYu @gini_do @mahnerak .

Lei Yu

@jade_lei_yu

1 year

New paper! 🎊 We are delighted to announce our new paper "Robust LLM Safeguarding via Refusal Feature Adversarial Training"! There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer!

0

1

10

Jason Weston

@jaseweston

1 year

🚨New paper: Source2Synth🚨 - Generates synthetic examples grounded in real data - Curation step makes data high quality based on answerability - Improves performance on two challenging domains: Multi-hop QA and using tools: SQL for tabular QA https://t.co/4uggsiniIv 🧵(1/4)

1

54

255

Nicola Cancedda

@nicola_cancedda

2 years

A great new tool for accelerating LLM interpretation!

Karen Hambardzumyan

@mahnerak

2 years

[1/7] 🚀 Introducing the Language Model Transparency Tool - an open-source interactive toolkit for analyzing Transformer-based language models. We can't wait to see how the community will use this tool! https://t.co/IsAVDb1Eg9

0

5

Nicola Cancedda

@nicola_cancedda

2 years

Rainbow Teaming adapts successful methods from Reinforcement Learning to automatically create adversarial prompts that are both effective and diverse. Finetuning on this dataset in turn greatly increases the model robustness to adversarial attacks. Check the paper out!

Mikayel Samvelyan

@_samvelyan

2 years

Introducing 🌈 Rainbow Teaming, a new method for generating diverse adversarial prompts for LLMs via LLMs It's a versatile tool 🛠️ for diagnosing model vulnerabilities across domains and creating data to enhance robustness & safety 🦺 Co-lead w/ @sharathraparthy & @_andreilupu

1

3

11

Alex Havrilla

@Dahoas1

2 years

New paper alert🚨🚨🚨 How to bootstrap the reasoning refinement capabilities of LLMs using synthetic data? Introducing "GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements". Applied on GSM8K we can improve a strong RL finetuned LLama-2 13B by 12%

2

20

86

Nicola Cancedda

@nicola_cancedda

2 years

[5/5] Spectral analysis extends the 'logit lens' into 'logit spectroscopy'. We hope to see it explain many more behaviors, in LLMs and multimodal models!

0

4

Nicola Cancedda

@nicola_cancedda

2 years

[4/5] We also find that bright "bars" in attention matrices are tokens with a large prevalence of dark signals, although they are not similar to Token 0.

1

0

4

Nicola Cancedda

@nicola_cancedda

2 years

[3/5] But if one preserves the last band then the loss increases only slowly when suppressing more bands on its left, and generated text remains coherent.

1

0

3

Nicola Cancedda

@nicola_cancedda

2 years

[2/5] The tail end of the spectrum is responsible for the Token 0 'attention sink' mechanism. Suppressing the last band (the 'dark signals') makes the loss spike and greatly degrades generated text.

1

0

4