Nicola Cancedda
@nicola_cancedda
Followers
497
Following
208
Media
8
Statuses
51
Research Scientist Manager @MetaAI
London, England
Joined January 2009
Honored to be featured on VentureBeat! Meta researchers open the LLM black box to repair flawed AI reasoning https://t.co/r3HmGSufd0 via @VentureBeat
venturebeat.com
The proof-of-concept could pave the way for a new class of AI debuggers, making language models more reliable for business-critical applications.
0
0
3
Thank you @michaelwhanna and the rest of the circuit-tracer authors: this work would have taken twice as long without it!
0
0
6
Very proud of this work! We are making nice progress towards LLM debugging using mechanistic interpretability tools. Check it out!
Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR @metaai. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.
3
11
97
In LLMs different layers do different things. Reasoning capabilities can be improved by finding the right layers and teaching the model to loop over them. Great work @yeskendir_k !
๐งต New paper from FAIR (Meta) on recursion + latent reasoning: "Encode,Think,Decode (ETD): Scaling reasoning through recursive latent thoughts". ETD improves the reasoning of base model by training it to iterate over a subset of reasoning-critical layers during mid-training.(1/n)
1
6
64
Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! ๐ Weโll be in Nashville next week. Come say hi ๐ @CVPR @miv_cvpr2025
3
29
272
I am at #ICLR and honored to present this work on Saturday afternoon at the poster session. Thanks @jade_lei_yu @mahnerak @nicola_cancedda for this wonderful collaboration! I am also happy to chat about Llama / agents / safety ๐
New paper! ๐ We are delighted to announce our new paper "Robust LLM Safeguarding via Refusal Feature Adversarial Training"! There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer!
0
5
28
We can steer LLMs to 'sound' less confident or altogether abstain when they are more likely to hallucinate. Check this out!
๐Excited to share our work from #Meta #FAIR: Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations! We found "Verbal Uncertainty" is governed by a linear feature and used this insight to detect and mitigate hallucinations. ๐ https://t.co/JcandVIfrx
0
0
0
I would like to thank the many who are applying for this position. My apologies to those who are sending me emails: I will not be able to reply to you, you will hear back from one of Meta's recruiters in due course.
0
0
4
Hi all, I'll be hosting a Research Scientist intern for 2025 with a focus on exploring LLM safety alignment and understanding potentially using explainability methods. If that sounds of interest apply/reach out! https://t.co/HjBZrLlHGR
7
37
210
A condition is that you are a student enrolled in a PhD program, and still will be at the time of the internship. There are also other announcements for RS internships on other topics on the Meta careers site as well.
2
0
7
I am looking for a Research Scientist intern for 2025. If you have already published work that involves understanding behaviours of AI models looking at their parameters and activations, I would like to hear from you.
5
50
327
There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer! Check out our new work! With @JadeLeiYu @gini_do @mahnerak .
New paper! ๐ We are delighted to announce our new paper "Robust LLM Safeguarding via Refusal Feature Adversarial Training"! There is a common mechanism behind LLM jailbreaking, and it can be leveraged to make models safer!
0
1
10
๐จNew paper: Source2Synth๐จ - Generates synthetic examples grounded in real data - Curation step makes data high quality based on answerability - Improves performance on two challenging domains: Multi-hop QA and using tools: SQL for tabular QA https://t.co/4uggsiniIv ๐งต(1/4)
1
54
255
A great new tool for accelerating LLM interpretation!
[1/7] ๐ Introducing the Language Model Transparency Tool - an open-source interactive toolkit for analyzing Transformer-based language models. We can't wait to see how the community will use this tool! https://t.co/IsAVDb1Eg9
0
0
5
Rainbow Teaming adapts successful methods from Reinforcement Learning to automatically create adversarial prompts that are both effective and diverse. Finetuning on this dataset in turn greatly increases the model robustness to adversarial attacks. Check the paper out!
Introducing ๐ Rainbow Teaming, a new method for generating diverse adversarial prompts for LLMs via LLMs It's a versatile tool ๐ ๏ธ for diagnosing model vulnerabilities across domains and creating data to enhance robustness & safety ๐ฆบ Co-lead w/ @sharathraparthy & @_andreilupu
1
3
11
New paper alert๐จ๐จ๐จ How to bootstrap the reasoning refinement capabilities of LLMs using synthetic data? Introducing "GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements". Applied on GSM8K we can improve a strong RL finetuned LLama-2 13B by 12%
2
20
86
[5/5] Spectral analysis extends the 'logit lens' into 'logit spectroscopy'. We hope to see it explain many more behaviors, in LLMs and multimodal models!
0
0
4
[4/5] We also find that bright "bars" in attention matrices are tokens with a large prevalence of dark signals, although they are not similar to Token 0.
1
0
4
[3/5] But if one preserves the last band then the loss increases only slowly when suppressing more bands on its left, and generated text remains coherent.
1
0
3
[2/5] The tail end of the spectrum is responsible for the Token 0 'attention sink' mechanism. Suppressing the last band (the 'dark signals') makes the loss spike and greatly degrades generated text.
1
0
4