Nicola Cancedda Profile
Nicola Cancedda

@nicola_cancedda

Followers
497
Following
208
Media
8
Statuses
51

Research Scientist Manager @MetaAI

London, England
Joined January 2009
Don't wanna be here? Send us removal request.
@nicola_cancedda
Nicola Cancedda
10 days
Thank you @michaelwhanna and the rest of the circuit-tracer authors: this work would have taken twice as long without it!
0
0
6
@nicola_cancedda
Nicola Cancedda
10 days
Very proud of this work! We are making nice progress towards LLM debugging using mechanistic interpretability tools. Check it out!
@zhengzhao97
Zheng Zhao
10 days
Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR @metaai. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.
3
11
97
@nicola_cancedda
Nicola Cancedda
20 days
In LLMs different layers do different things. Reasoning capabilities can be improved by finding the right layers and teaching the model to loop over them. Great work @yeskendir_k !
@yeskendir_k
Yeskendir ๐Ÿ‡ฐ๐Ÿ‡ฟ
20 days
๐Ÿงต New paper from FAIR (Meta) on recursion + latent reasoning: "Encode,Think,Decode (ETD): Scaling reasoning through recursive latent thoughts". ETD improves the reasoning of base model by training it to iterate over a subset of reasoning-critical layers during mid-training.(1/n)
1
6
64
@soniajoseph_
Sonia
5 months
Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! ๐ŸŽ‰ Weโ€™ll be in Nashville next week. Come say hi ๐Ÿ‘‹ @CVPR @miv_cvpr2025
3
29
272
@gini_do
Virginie Do
6 months
I am at #ICLR and honored to present this work on Saturday afternoon at the poster session. Thanks @jade_lei_yu @mahnerak @nicola_cancedda for this wonderful collaboration! I am also happy to chat about Llama / agents / safety ๐Ÿ‘‹
@jade_lei_yu
Lei Yu
1 year
New paper! ๐ŸŽŠ We are delighted to announce our new paper "Robust LLM Safeguarding via Refusal Feature Adversarial Training"! There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer!
0
5
28
@nicola_cancedda
Nicola Cancedda
6 months
We can steer LLMs to 'sound' less confident or altogether abstain when they are more likely to hallucinate. Check this out!
@ZiweiJi184538
Ziwei Ji
6 months
๐Ÿš€Excited to share our work from #Meta #FAIR: Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations! We found "Verbal Uncertainty" is governed by a linear feature and used this insight to detect and mitigate hallucinations. ๐Ÿ”— https://t.co/JcandVIfrx
0
0
0
@nicola_cancedda
Nicola Cancedda
1 year
I would like to thank the many who are applying for this position. My apologies to those who are sending me emails: I will not be able to reply to you, you will hear back from one of Meta's recruiters in due course.
0
0
4
@dgolano
Diego Garcia-Olano
1 year
Hi all, I'll be hosting a Research Scientist intern for 2025 with a focus on exploring LLM safety alignment and understanding potentially using explainability methods. If that sounds of interest apply/reach out! https://t.co/HjBZrLlHGR
7
37
210
@nicola_cancedda
Nicola Cancedda
1 year
A condition is that you are a student enrolled in a PhD program, and still will be at the time of the internship. There are also other announcements for RS internships on other topics on the Meta careers site as well.
2
0
7
@nicola_cancedda
Nicola Cancedda
1 year
I am looking for a Research Scientist intern for 2025. If you have already published work that involves understanding behaviours of AI models looking at their parameters and activations, I would like to hear from you.
5
50
327
@nicola_cancedda
Nicola Cancedda
1 year
There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer! Check out our new work! With @JadeLeiYu @gini_do @mahnerak .
@jade_lei_yu
Lei Yu
1 year
New paper! ๐ŸŽŠ We are delighted to announce our new paper "Robust LLM Safeguarding via Refusal Feature Adversarial Training"! There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer!
0
1
10
@jaseweston
Jason Weston
1 year
๐ŸšจNew paper: Source2Synth๐Ÿšจ - Generates synthetic examples grounded in real data - Curation step makes data high quality based on answerability - Improves performance on two challenging domains: Multi-hop QA and using tools: SQL for tabular QA https://t.co/4uggsiniIv ๐Ÿงต(1/4)
1
54
255
@nicola_cancedda
Nicola Cancedda
2 years
A great new tool for accelerating LLM interpretation!
@mahnerak
Karen Hambardzumyan
2 years
[1/7] ๐Ÿš€ Introducing the Language Model Transparency Tool - an open-source interactive toolkit for analyzing Transformer-based language models. We can't wait to see how the community will use this tool! https://t.co/IsAVDb1Eg9
0
0
5
@nicola_cancedda
Nicola Cancedda
2 years
Rainbow Teaming adapts successful methods from Reinforcement Learning to automatically create adversarial prompts that are both effective and diverse. Finetuning on this dataset in turn greatly increases the model robustness to adversarial attacks. Check the paper out!
@_samvelyan
Mikayel Samvelyan
2 years
Introducing ๐ŸŒˆ Rainbow Teaming, a new method for generating diverse adversarial prompts for LLMs via LLMs It's a versatile tool ๐Ÿ› ๏ธ for diagnosing model vulnerabilities across domains and creating data to enhance robustness & safety ๐Ÿฆบ Co-lead w/ @sharathraparthy & @_andreilupu
1
3
11
@Dahoas1
Alex Havrilla
2 years
New paper alert๐Ÿšจ๐Ÿšจ๐Ÿšจ How to bootstrap the reasoning refinement capabilities of LLMs using synthetic data? Introducing "GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements". Applied on GSM8K we can improve a strong RL finetuned LLama-2 13B by 12%
2
20
86
@nicola_cancedda
Nicola Cancedda
2 years
[5/5] Spectral analysis extends the 'logit lens' into 'logit spectroscopy'. We hope to see it explain many more behaviors, in LLMs and multimodal models!
0
0
4
@nicola_cancedda
Nicola Cancedda
2 years
[4/5] We also find that bright "bars" in attention matrices are tokens with a large prevalence of dark signals, although they are not similar to Token 0.
1
0
4
@nicola_cancedda
Nicola Cancedda
2 years
[3/5] But if one preserves the last band then the loss increases only slowly when suppressing more bands on its left, and generated text remains coherent.
1
0
3
@nicola_cancedda
Nicola Cancedda
2 years
[2/5] The tail end of the spectrum is responsible for the Token 0 'attention sink' mechanism. Suppressing the last band (the 'dark signals') makes the loss spike and greatly degrades generated text.
1
0
4