Meng (Caden) Cao
@Meng_0209
Followers
62
Following
54
Media
0
Statuses
24
Final-year Ph.D. student in Mila @Mila_Quebec and McGill University. | Intern @google | NLP, RL, LLMs
Montreal, Canada
Joined March 2017
1/N Why do LLMs fail at math word problems without CoT? 🔍Final-answer accuracy alone doesn’t tell the full story. 💥It’s not that they can’t understand the problem, the major bottleneck is doing the calculations correctly. 📌Read our EMNLP main paper:
1
6
8
Our paper on reasoning × interpretability × evaluation has been accepted to EMNLP main! Excited because this marks the start of a new research direction I’m diving into. Huge thanks to @Meng_0209, @yanshuaicao, Leila, and Jackie! 📌 https://t.co/rCpXYj7SqN
2
13
76
Do LLMs hallucinate randomly? Not quite. Our #ACL2025 (Main) paper shows that hallucinations under irrelevant contexts follow a systematic failure mode — revealing how LLMs generalize using abstract classes + context cues, albeit unreliably. 📎 Paper: https://t.co/YEK4TaI7pq 1/n
6
25
44
Revoking visas to Chinese PhD students is economically shortsighted and inhumane. Most Chinese PhD students stay in the U.S. after graduation (first image, stats from 2022). They're staying and building technology in the U.S., not taking it to China. Immigrant students create
6
43
363
"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and
38
197
1K
Automated detection of LLM hallucinations using only correct examples is fundamentally difficult. This paper shows detection is equivalent to the hard task of language identification, but providing detectors with both correct and explicitly incorrect examples makes reliable
6
50
241
🤔How do LLMs perform reasoning and recall memorized knowledge? How similar are their underlying mechanisms? We reveal their inherent distinction within LLMs' representations, and identify linear features that mediate model switch between genuine reasoning and memory recall.
3
3
12
New R1-Zero experiments with GRPO: 1. Mask the loss from completions that don't terminate in an EOS token (DAPO). Significantly improves stability when doing importance sampling with μ>0. Coming soon to TRL! 2. Use a "soft" format reward function to elicit the <think> and
New log book: figuring out which of the many methods are actually needed for stable R1-Zero-like training
4
34
273
R1-zero is such a striking example of a discovery that’s blatantly obvious in retrospect, yet eluded so many for such a long time
39
79
2K
We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly strong. 🚀 Starting from Qwen2.5-Math-7B (base model), we perform RL on it directly. No SFT, no reward model, just 8K MATH examples for verification, the
69
646
4K
I spent the weekend reading some recent great math+reasoning papers: 1. AceMath ( https://t.co/ftQZiyU3kj) 2. rStar-Math ( https://t.co/N8bket1wpx) 3. PRIME ( https://t.co/82oSXF3oa6) Here are some of my naive thoughts! It could be wrong. All of these papers are showing possible
19
160
926
From OpenAI’s PPO, people start simplify it by removing its mechanisms, especially credit assignment, without performance loss. This contradicts the DeepRL belief that credit assignment is crucial. Find how we address this contradiction at MATHAI workshop on 11AM & 4PM.
8
33
250
1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feeling uncomfortable. As a community, I believe we should take a moment to reflect on why such remarks in public discourse can be offensive and harmful.
180
565
4K
Just wrote a script to further investigate how the corpus used to train the gpt4o tokenizer is polluted by Internet scams. The results are quite interesting... 🤦♂️🤦♂️🤦♂️ https://t.co/Fc2T4rSHix
47
103
459
How and when, and with which issues, does the text summarization community engage with responsible AI? 🤔 In our new #EMNLP2023 Findings paper, we examine reporting and research practices across 300 summarization papers published between 2020-2022 🧵
1
11
27
I'm excited to announce the release of GPT4All, a 7B param language model finetuned from a curated set of 400k GPT-Turbo-3.5 assistant-style generation. We release💰800k data samples💰 for anyone to build upon and a model you can run on your laptop! Real-time Sampling on M1 Mac
154
934
6K
#LLM's are stochastic parrots fabricating a discourse with zero understanding. We have introduced a formal way to augment any LLM w/ **systematic rectification** to enable prediction. Excellent collaboration w/ @Meng_0209 @SamiraShabanian Jackie Cheung ! https://t.co/26DwSXeoZt
0
3
10
#EMNLP2022 Excited to present my paper “Questioning the Validity of Summarization Datasets and Improving Their Factuality” in poster session 8 tomorrow 9:00-10:30 am. #NLProc Many thanks to my advisors @mvazirg and @ChloeDClavel!
0
4
13