Zheng Zhao @EMNLP🇨🇳 @zhengzhao97 X Profile

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

Followers

472

Following

220

Media

8

Statuses

158

PhD Student @Edin_CDT_NLP @edinburghnlp | former intern @AIatMeta @amazon | working on LLMs

https://t.co/kBQFrZTzhV

Joined February 2012

Don't wanna be here? Send us removal request.

Agostina Calabrese @EMNLP 🦋

@agostina_cal

2 days

At #EMNLP2025 to present the last chapter of my PhD 🐼 Let's talk #HateSpeech detection, generalisation and NLP safety at my poster: 📆tomorrow 🕟4.30pm Look for the circus-themed poster 🎪🤸🏻‍♀️ Work with @tomsherborne @bjoernross and @mlapata at @EdinburghNLP + @cohere

0

5

29

Fazl Barez@EMNLP 🇨🇳

@FazlBarez

2 days

We’re hiring! Looking for Interns, Research Assistants, and Postdocs to work on Automated Interpretability--building systems that can analyse, explain, and intervene on large models to make them safe! Work with me @Oxford, or remotely. Apply by Nov 15: https://t.co/KEqXwpxgyb

20

113

843

Adina Williams

@adinamwilliams

2 days

FAIR is hiring interns for 2026! If you're interested in a stint doing fundamental AI research with us @AIatMeta, interested students enrolled in a PhD program can apply below👇: https://t.co/PrG9L625bY

metacareers.com

Meta's mission is to build the future of human connection and the technology that makes it possible.

15

47

431

Naila Murray

@NailaMurray

10 days

Happy to share this work! Turns out mechanistic interpretability tools are useful for debugging chain-of-thought reasoning errors. Awesome work led by @zhengzhao97!

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR @metaai. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.

2

6

86

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

12 days

As I have said earlier, @xianjun_agi is one of the most brilliant AI researchers I've had the pleasure of working with. Any team would be lucky to have him! For a glimpse into our work, see the thread below: https://t.co/ZDDYEQ6om4

Xianjun Yang

@xianjun_agi

13 days

As a new grad and early-career researcher, I’m truly overwhelmed and grateful for the incredible support from the community. Within 24 hours, I’ve received hundreds of kind messages and job opportunities— a reminder of how warm and vibrant the AI community is. I’ll take time to

1

0

17

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

12 days

I was lucky to work with Xianjun at FAIR, and he is one of the most brilliant AI researchers I've known and he will be a tremendous asset to his next team. On a related note, I am also on the job market for Research/Applied Scientist roles. Please feel free to reach out to me!

Xianjun Yang

@xianjun_agi

14 days

I was laid off by Meta today. As a Research Scientist, my work was just cited by the legendary @johnschulman2 and Nicholas Carlini yesterday. I’m actively looking for new opportunities — please reach out if you have any openings!

5

6

67

Nicola Cancedda

@nicola_cancedda

13 days

Very proud of this work! We are making nice progress towards LLM debugging using mechanistic interpretability tools. Check it out!

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR @metaai. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.

3

12

96

Xianjun Yang

@xianjun_agi

13 days

As a new grad and early-career researcher, I’m truly overwhelmed and grateful for the incredible support from the community. Within 24 hours, I’ve received hundreds of kind messages and job opportunities— a reminder of how warm and vibrant the AI community is. I’ll take time to

arxiv.org

Current Chain-of-Thought (CoT) verification methods predict reasoning correctness based on outputs (black-box) or activations (gray-box), but offer limited insight into why a computation fails. We...

18

48

685

Yeskendir 🇰🇿 @EMNLP 🇨🇳

@yeskendir_k

13 days

Happy to see this come together! We applied interpretability tools to verify chain-of-thought reasoning steps. Fantastic work led by @zhengzhao97 — check it out!

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR @metaai. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.

1

3

14

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

@yeskendir_k @xianjun_agi @NailaMurray @nicola_cancedda [9/n] You can read the full paper here:

0

1

10

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

[8/n] In sum, our work establishes CRV as a powerful proof-of-concept for moving beyond error detection to a causal understanding of LLM reasoning. I'm deeply grateful for the incredible mentorship @yeskendir_k @xianjun_agi @NailaMurray @nicola_cancedda .

1

0

8

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

[7/n] Crucially, we show causality, not just correlation. By identifying a single, prematurely activated feature causing an error, we performed a targeted intervention to causally correct the model's reasoning path. This is a vital step toward truly debugging LLMs.

2

0

7

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

[6/n] We also visualised the 'structural fingerprints' of error, projecting high-dimensional features via PCA and find that incorrect steps form a dense cluster, which is structurally similar to correct steps, yet occupying their own distinct region.

1

0

6

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

[5/n] One of our key findings is that error signatures are highly domain-specific. A classifier trained to spot errors in arithmetic fails on formal logic, suggesting that different reasoning tasks manifest unique computational failure patterns.

2

0

8

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

[4/n] We found that these structural signatures are highly predictive of errors. Our method, CRV, outperforms strong baselines across all tested datasets. This demonstrates the verifiable signal present in the computational trace.

1

0

6

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

[3/n] How does CRV work? Our pipeline involves: 1. Replacing MLP modules with interpretable sparse transcoders. 2. Constructing step-level attribution graphs. 3. Extracting a rich set of structural features. 4. Training a classifier to detect flawed reasoning steps.

2

0

11

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

[2/n] Our core hypothesis: correct and incorrect reasoning steps leave distinct "structural fingerprints" on the model's computational graph. We move beyond standard verification to analyse the structure of the computation itself.

1

0

14

Zheng Zhao @EMNLP🇨🇳

@zhengzhao97

13 days

Thrilled to share our latest research on verifying CoT reasonings, completed during my recent internship at FAIR @metaai. In this work, we introduce Circuit-based Reasoning Verification (CRV), a new white-box method to analyse and verify how LLMs reason, step-by-step.

7

48

335