Sitao Cheng
@TonyCheng990417
Followers
239
Following
1K
Media
12
Statuses
148
Phd Student @UWCheritonCS. Interested in Reasoning, Language agents. Ex @ucsbNLP, @MSFTResearch, @NanjingUnivers1.
Joined September 2023
Thanks Rohan for sharing our paper! โ๏ธWe explore the hot dabate about the true nature of RL as a synthesizer or amplifier. ๐We find that RL genuinly composes new skills but with a condition that the base model captures SUFFICIENT atomic abilities. ๐
The paper tests how reinforcement learning improves language model reasoning and finds it can build new skills from simple ones. Models trained only on the final task hit about 90% on familiar questions but drop to about 18% on new patterns. The authors build a synthetic world
0
3
6
If you are in Suzhou for #EMNLP2025, don't miss our paper!
OverBench is dynamic, scalable, and evolving. We hope it enables safer and more helpful LLMs! ๐ Paper: https://t.co/u7TZm13dc1 ๐พ Dataset: https://t.co/yBEC56jwy3 Thanks to all collaborators: @TonyCheng990417 @xwang_lk @WilliamWangNLP
0
0
3
I am hiring for fully funded (up to 3 years) postdoc positions in AI for science at Waterloo/Vector: multimodal deep research, agents, tool-use. You'll work closely w/ industry partners & lead projects. Please share! Apply at https://t.co/ShYRDnnjIu or email me directly!
6
81
363
Why feed 1M tokens when ~250k visual tokens do? ๐๐ Concurrent to DeepSeek-OCR, today weโre releasing Glyph, a visual-text compression paradigm that turns long text into images and lets a VLM read them. Paper: https://t.co/dvYaKjWoXW
@karpathy may be you will be also
21
98
611
๐ฃNew paper: Rigorous AI agent evaluation is much harder than it seems. For the last year, we have been working on infrastructure for fair agent evaluations on challenging benchmarks. Today, we release a paper that condenses our insights from 20,000+ agent rollouts on 9
20
101
426
Yuandong Tian, a Research Scientist Director at Meta FAIR, released a new paper addressing the question: Why do neural networks โgrokโโmemorizing first, then suddenly generalizing? He proposes a mathematical framework that explains how and when features emerge during grokking in
5
86
433
๐จ๐ณ Beautiful paper from Chinese Lab Tencent, on why Tool-Integrated Reasoning (TIR) enhances LLMs. And introduce a new algorithm, Advantage Shaping Policy Optimization (ASPO). ๐ง The idea Tools let an LLM run code, which expands its reachable reasoning space and compresses
5
84
415
๐ Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and
2
37
158
It seems that the famous MATH dataset has been taken down due to infringing AoPS's copyright. I guess we are not supposed to train or evaluate on any dataset that uses APoS's material. This will include AIME, AMC, and OmniMath, etc.
6
8
102
We did it! ๐ 12 papers from UCSB NLP accepted at #EMNLP2025 (7 Main + 5 Findings) Proud of everyoneโs hard workโposter below ๐
0
10
41
We are super excited to release OpenCUA โ the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. ๐ [Paper] https://t.co/naBIDnyvYY ๐
14
104
468
๐ท๏ธIf you are still at #ACL2025 Vienna, don't miss our oral workshop presentation by @xunjian_yin (11:45-12:00). We study LLMs utilization of their parametric knowledge when various types of contextual knowledge is present. ๐: https://t.co/EdVQiGQ2YU ๐: https://t.co/aNLha3yunf
0
3
9
โ๏ธCurrently attending ACL #ACL2025 in Vienna, Austria. Will present at In-Person at Hall 4/5 (July 30, 10:30 - 12:00): ๐ฉGรถdel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement Come and say hi!
4
5
21
๐๐ Catch me at ACL tomorrow (July 28) during the 11:00โ12:30 poster session! I will be presenting RuleArena, a challenging LLM benchmark for LLM reasoning under the guidance of complex real-world natural language rules. Come by and let's talk! ๐๐ #ACL2025 #LLMs #NLProc
๐๐Can #LLMs Handle Your Taxes? ๐ธ Thank you @skyriver_2000 for leading this very interesting project! He is applying for PhD program now :) Introducing RuleArena โ a cutting-edge benchmark designed to test the logic reasoning of large language models with ~100 natural
0
2
4
I have been training neural networks for 9 years now. Here are 16 ways I actively use to optimize model training:
25
200
2K
Thanks Rohan for sharing our new paper! We release a foundation model "LEDOM" training reversely FROM SCRATCH. We analyse some interesting insights and explore applications of LEDOM in mathematical reasoning.
Train forward, validate backward, end up with cleaner answers. LEDOM trains by predicting the previous token instead of the next, then uses that reverse view to judge and boost ordinary forward models. The study shows this backward reward notably tightens math answers. Forward
0
1
6
Finally hit 100 citations. It's lotta hard work and lotta fun. Just a fresh start, let's go get more!
2
0
31
When Models Know More Than They Can Explain Quantifying Knowledge Transfer in Human-AI Collaboration
3
47
180
Very nice work by my bro @ard25974550 ! Check it out!
Thrilled Gรถdel Agent got noticed by Sakana AI & excited for their expansion! Accepted at ACL 2025, our 1st fully self-referential agent can read & modify its entire logic (even that logic). Done via recursion. Paper:
1
0
2
๐ Introducing our latest work โ Enigmata: A Full-Stack Recipe for Advancing Logical Reasoning in LLMs! Enigmata offers a complete pipeline from data generation โ verification โ RLVR training โ evaluation, designed to systematically enhance the puzzle reasoning skills of LLMs.
11
46
267