Meng_0209 Profile Banner
Meng (Caden) Cao Profile
Meng (Caden) Cao

@Meng_0209

Followers
62
Following
54
Media
0
Statuses
24

Final-year Ph.D. student in Mila @Mila_Quebec and McGill University. | Intern @google | NLP, RL, LLMs

Montreal, Canada
Joined March 2017
Don't wanna be here? Send us removal request.
@ziling_cheng
Ziling Cheng @ EMNLP
25 days
1/N Why do LLMs fail at math word problems without CoT? 🔍Final-answer accuracy alone doesn’t tell the full story. 💥It’s not that they can’t understand the problem, the major bottleneck is doing the calculations correctly. 📌Read our EMNLP main paper:
1
6
8
@ziling_cheng
Ziling Cheng @ EMNLP
3 months
Our paper on reasoning × interpretability × evaluation has been accepted to EMNLP main! Excited because this marks the start of a new research direction I’m diving into. Huge thanks to @Meng_0209, @yanshuaicao, Leila, and Jackie! 📌 https://t.co/rCpXYj7SqN
2
13
76
@ziling_cheng
Ziling Cheng @ EMNLP
6 months
Do LLMs hallucinate randomly? Not quite. Our #ACL2025 (Main) paper shows that hallucinations under irrelevant contexts follow a systematic failure mode — revealing how LLMs generalize using abstract classes + context cues, albeit unreliably. 📎 Paper: https://t.co/YEK4TaI7pq 1/n
6
25
44
@gregd_nlp
Greg Durrett
6 months
Revoking visas to Chinese PhD students is economically shortsighted and inhumane. Most Chinese PhD students stay in the U.S. after graduation (first image, stats from 2022). They're staying and building technology in the U.S., not taking it to China. Immigrant students create
6
43
363
@AlexGDimakis
Alex Dimakis
7 months
"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and
38
197
1K
@rohanpaul_ai
Rohan Paul
7 months
Automated detection of LLM hallucinations using only correct examples is fundamentally difficult. This paper shows detection is equivalent to the hard task of language identification, but providing detectors with both correct and explicitly incorrect examples makes reliable
6
50
241
@jade_lei_yu
Lei Yu
7 months
🤔How do LLMs perform reasoning and recall memorized knowledge? How similar are their underlying mechanisms? We reveal their inherent distinction within LLMs' representations, and identify linear features that mediate model switch between genuine reasoning and memory recall.
3
3
12
@_lewtun
Lewis Tunstall
8 months
New R1-Zero experiments with GRPO: 1. Mask the loss from completions that don't terminate in an EOS token (DAPO). Significantly improves stability when doing importance sampling with μ>0. Coming soon to TRL! 2. Use a "soft" format reward function to elicit the <think> and
@_lewtun
Lewis Tunstall
8 months
New log book: figuring out which of the many methods are actually needed for stable R1-Zero-like training
4
34
273
@sea_snell
Charlie Snell
10 months
R1-zero is such a striking example of a discovery that’s blatantly obvious in retrospect, yet eluded so many for such a long time
39
79
2K
@junxian_he
Junxian He
10 months
We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly strong. 🚀 Starting from Qwen2.5-Math-7B (base model), we perform RL on it directly. No SFT, no reward model, just 8K MATH examples for verification, the
69
646
4K
@WenhuChen
Wenhu Chen
11 months
I spent the weekend reading some recent great math+reasoning papers: 1. AceMath ( https://t.co/ftQZiyU3kj) 2. rStar-Math ( https://t.co/N8bket1wpx) 3. PRIME ( https://t.co/82oSXF3oa6) Here are some of my naive thoughts! It could be wrong. All of these papers are showing possible
19
160
926
@a_kazemnejad
Amirhossein Kazemnejad
1 year
From OpenAI’s PPO, people start simplify it by removing its mechanisms, especially credit assignment, without performance loss. This contradicts the DeepRL belief that credit assignment is crucial. Find how we address this contradiction at MATHAI workshop on 11AM & 4PM.
8
33
250
@drjingjing2026
Jing-Jing Li
1 year
1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feeling uncomfortable. As a community, I believe we should take a moment to reflect on why such remarks in public discourse can be offensive and harmful.
180
565
4K
@_philschmid
Philipp Schmid
1 year
Not Llama 3 405B, but Nemotron 4 340B! @nvidia just released 340B dense LLM matching the original @OpenAI GPT-4 performance for chat applications and synthetic data generation. 🤯 NVIDIA does not claim ownership of any outputs generated. 💚 TL;DR: 🧮 340B Paramters with 4k
53
191
1K
@tianle_cai
Tianle Cai ✈️ NeurIPS 2025
2 years
Just wrote a script to further investigate how the corpus used to train the gpt4o tokenizer is polluted by Internet scams. The results are quite interesting... 🤦‍♂️🤦‍♂️🤦‍♂️ https://t.co/Fc2T4rSHix
@main_horse
main
2 years
"why was the gpt-4o demo so horny?"
47
103
459
@liu_yu_lu
Yu Lu Liu 🦋@ liuyulu.bsky.social
2 years
How and when, and with which issues, does the text summarization community engage with responsible AI? 🤔 In our new #EMNLP2023 Findings paper, we examine reporting and research practices across 300 summarization papers published between 2020-2022 🧵
1
11
27
@andriy_mulyar
Andriy Mulyar
3 years
I'm excited to announce the release of GPT4All, a 7B param language model finetuned from a curated set of 400k GPT-Turbo-3.5 assistant-style generation. We release💰800k data samples💰 for anyone to build upon and a model you can run on your laptop! Real-time Sampling on M1 Mac
154
934
6K
@mefatemi
Mehdi Fatemi
3 years
#LLM's are stochastic parrots fabricating a discourse with zero understanding. We have introduced a formal way to augment any LLM w/ **systematic rectification** to enable prediction. Excellent collaboration w/ @Meng_0209 @SamiraShabanian Jackie Cheung ! https://t.co/26DwSXeoZt
0
3
10
@yanzhu_guo
Yanzhu GUO @ ACL
3 years
#EMNLP2022 Excited to present my paper “Questioning the Validity of Summarization Datasets and Improving Their Factuality” in poster session 8 tomorrow 9:00-10:30 am. #NLProc Many thanks to my advisors @mvazirg and @ChloeDClavel!
0
4
13