Tong Chen @ NeurIPS Profile
Tong Chen @ NeurIPS

@tomchen0

Followers
840
Following
213
Media
27
Statuses
167

PhD student @uwcse @uwnlp

Joined February 2023
Don't wanna be here? Send us removal request.
@tomchen0
Tong Chen @ NeurIPS
1 month
OpenAI's blog ( https://t.co/Mu05PFfPXg) points out that todayโ€™s language models hallucinate because training and evaluation reward guessing instead of admitting uncertainty. This raises a natural question: can we reduce hallucination without hurting utility?๐Ÿค” On-policy RL with
27
123
674
@liweijianglw
Liwei Jiang @ NeurIPS 2025
13 days
Super happy to receive the Best Paper Award at #NeurIPS2025 for our Artificial Hivemind paper!! (Really enjoyed giving oral talk at NeurIPS as well!)
@liweijianglw
Liwei Jiang @ NeurIPS 2025
2 months
โš ๏ธDifferent models. Same thoughts.โš ๏ธ Todayโ€™s AI models converge into an ๐€๐ซ๐ญ๐ข๐Ÿ๐ข๐œ๐ข๐š๐ฅ ๐‡๐ข๐ฏ๐ž๐ฆ๐ข๐ง๐ ๐Ÿ, a striking case of mode collapse that persists even across heterogeneous ensembles. Our #neurips2025 ๐ƒ&๐ ๐Ž๐ซ๐š๐ฅ ๐ฉ๐š๐ฉ๐ž๐ซ (โœจ๐ญ๐จ๐ฉ ๐ŸŽ.๐Ÿ‘๐Ÿ“%โœจ) dives deep into
37
67
784
@rui_xin31
Rui Xin
15 days
I'll be at #NeurIPS2025 until 12/7! I work on post-training and reward signals (Spurious Rewards), currently curious about bridging the gap between how humans and LLMs learn. Looking forward to connecting with new and old friendsโ€”also exploring summer 2025 internships. DMs open!
3
7
55
@tomchen0
Tong Chen @ NeurIPS
14 days
I will be at #NeurIPS2025 12.3โ€“12.7 Looking forward to meeting old and new friends ! โ˜•๏ธ๐ŸŒฎ Recently working on hallucination (Binary RAR) and verbatim memorization (ParaPO), issues that scaling up pretraining cannot simply fix. Also interested in making models learn more like
1
5
36
@ypwang61
Yiping Wang
16 days
8B model can outperform AlphaEvolve on open optimization problems by scaling compute for inference or test-time RL๐Ÿš€! โญ•Circle packing: AlphaEvolve (Gemini-2.0-Flash/Pro) : 2.63586276 Ours (DeepSeek-R1-0528-Qwen3-8B) : 2.63598308 ๐Ÿ”—in๐Ÿงต [1/n]
6
50
190
@tomchen0
Tong Chen @ NeurIPS
19 days
PhD applicants โ€” Join Akariโ€™s first cohort of students! Akari's research ranges from careful benchmarking to solid methodology. She always gives sharp feedback while being thoughtful and supportive. She stayed driven throughout her PhD and now brings that same energy to her new
@AkariAsai
Akari Asai
22 days
1/ Hiring PhD students at CMU SCS (LTI/MLD) for Fall 2026 (Deadline 12/10) ๐ŸŽ“ I work on open, reliable LMs: augmented LMs & agents (RAG, tool use, deep research), safety (hallucinations, copyright), and AI for science, code & multilinguality & open to bold new ideas! FAQ in ๐Ÿงต
2
3
86
@AkariAsai
Akari Asai
22 days
Exciting DR Tulu updates! ๐Ÿ“ˆ DR Tulu-8B (new RL ckpt) sits on the performanceโ€“cost frontier, beating Tongyi DR-30B and matching OpenAI DR/Gemini 3 Pro+Search at a fraction of the cost. Now on arXiv. ๐Ÿ–ฅ๏ธ You can run an interactive CLI demo with open code, almost for free. 1/๐Ÿงต
@allen_ai
Ai2
29 days
Today weโ€™re releasing Deep Research Tulu (DR Tulu)โ€”the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. ๐Ÿงญ๐Ÿ“š
4
29
151
@tomchen0
Tong Chen @ NeurIPS
27 days
Olmo3 is here! ๐ŸŽ‰ Fully open data and fully open training recipes again. ๐Ÿš€ Huge congrats to the whole team!
@allen_ai
Ai2
27 days
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flowโ€”not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. ๐Ÿงต
0
0
21
@RulinShao
Rulin Shao
29 days
๐Ÿ”ฅThrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR ๐Ÿ’ชYes, just 8B! ๐Ÿš€ The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model -
8
115
542
@nlpxuhui
Xuhui Zhou@NeurIPS
30 days
New blog post out! ๐Ÿ“œ We share our latest research efforts to build more effective, human-centered AI collaboration. Months ago, I was genuinely surprised by how quickly AI agents were improving, and with that came a deep fear of being replaced, of humans slowly losing agency as
Tweet card summary image
xuhuiz.com
Exploring what makes AI agents truly effective for users, beyond benchmark performance.
3
26
122
@tomchen0
Tong Chen @ NeurIPS
1 month
๐Ÿ’ก Finding 2: Binary reward works better than continuous reward. Continuous rewards such as VeriScore are vulnerable to reward hacking because the model can raise the score by adding correct but irrelevant claims or by adapting to formats favored by the claim extractor and
1
1
11
@tomchen0
Tong Chen @ NeurIPS
1 month
In short-form QA, binary RAR sharply cuts incorrect answers while keeping accuracy unchanged when the model is allowed to express uncertainty. Standard binary rewards for verifiable tasks give 1 only for correct answers and 0 for both incorrect and abstaining, which trains
1
1
10
@tomchen0
Tong Chen @ NeurIPS
1 month
๐Ÿ’ก Finding 1: Utility stays unchanged. A model can reach zero hallucination by always saying โ€œI do not know,โ€ but then it is not useful. Our approach avoids this. In long-form generation, Binary RAR keeps the number of correct claims unchanged and reduces incorrect claims. [4/n]
1
1
11
@tomchen0
Tong Chen @ NeurIPS
1 month
With Binary RAR, we cut hallucinations in both long-form generation (61.9โ†’37.5) and short-form question answering (60.6โ†’27.6), while core skills remain unchanged. Interestingly, after RL finetuning with binary RAR, the model keeps the same accuracy when forced to answer on
1
1
15
@tomchen0
Tong Chen @ NeurIPS
1 month
Binary Retrieval-Augmented Reward (Binary RAR) gives a reward of one only when the verifier finds no contradiction between the model output and retrieved evidence, and zero otherwise. No partial credit and little reward hacking. With this stepwise signal, the KL penalty in RL
1
1
17
@ZhiyuanZeng_
Zhiyuan Zeng
1 month
RL is bounded by finite data๐Ÿ˜ฃ? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model ๐Ÿ’กfind supervision signals right at the LM capability frontier + scale them ๐Ÿ”—in๐Ÿงต
12
115
475
@liujc1998
Jiacheng Liu
1 month
Our infini-gram mini paper received the Best Paper Award at #EMNLP2025 !! Really proud ๐Ÿฅน
@xuhaoxh
Hao Xu
6 months
Wanna ๐Ÿ”Ž inside Internet-scale LLM training data w/o spending ๐Ÿ’ฐ๐Ÿ’ฐ๐Ÿ’ฐ? Introducing infini-gram mini, an exact-match search engine with 14x less storage req than the OG infini-gram ๐Ÿ˜Ž We make 45.6 TB of text searchable. Read on to find our Web Interface, API, and more. (1/n) โฌ‡๏ธ
20
19
351
@shangbinfeng
Shangbin Feng
1 month
Model collaboration talk tour continues~ Compositional intelligence. Collaborative development. Decentralized AI. By the Many. The methods. The vision. The hot takes. The comedy. If you are around one of these places, let's chat!
1
8
41
@HowardYen1
Howard Yen
2 months
How to build agentic search systems for long-horizon tasks? Check out our new paper! - Simple design principles are efficient and effective - Error analysis and fine-grain analysis for search systems A ๐Ÿงต on SLIM, our long-horizon agentic search framework
1
14
42