Sitao Cheng Profile
Sitao Cheng

@TonyCheng990417

Followers
239
Following
1K
Media
12
Statuses
148

Phd Student @UWCheritonCS. Interested in Reasoning, Language agents. Ex @ucsbNLP, @MSFTResearch, @NanjingUnivers1.

Joined September 2023
Don't wanna be here? Send us removal request.
@TonyCheng990417
Sitao Cheng
5 days
Thanks Rohan for sharing our paper! โš–๏ธWe explore the hot dabate about the true nature of RL as a synthesizer or amplifier. ๐Ÿš€We find that RL genuinly composes new skills but with a condition that the base model captures SUFFICIENT atomic abilities. ๐Ÿ”—
@rohanpaul_ai
Rohan Paul
5 days
The paper tests how reinforcement learning improves language model reasoning and finds it can build new skills from simple ones. Models trained only on the final task hit about 90% on familiar questions but drop to about 18% on new patterns. The authors build a synthetic world
0
3
6
@TonyCheng990417
Sitao Cheng
1 month
If you are in Suzhou for #EMNLP2025, don't miss our paper!
@XiaoSophiaPu
Sophia Xiao Pu @NeurIPS2025
1 month
OverBench is dynamic, scalable, and evolving. We hope it enables safer and more helpful LLMs! ๐Ÿ“„ Paper: https://t.co/u7TZm13dc1 ๐Ÿ’พ Dataset: https://t.co/yBEC56jwy3 Thanks to all collaborators: @TonyCheng990417 @xwang_lk @WilliamWangNLP
0
0
3
@hllo_wrld
Victor Zhong
1 month
I am hiring for fully funded (up to 3 years) postdoc positions in AI for science at Waterloo/Vector: multimodal deep research, agents, tool-use. You'll work closely w/ industry partners & lead projects. Please share! Apply at https://t.co/ShYRDnnjIu or email me directly!
6
81
363
@ShawLiu12
Xiao Liu (Shaw)
2 months
Why feed 1M tokens when ~250k visual tokens do? ๐Ÿš€๐Ÿ‘€ Concurrent to DeepSeek-OCR, today weโ€™re releasing Glyph, a visual-text compression paradigm that turns long text into images and lets a VLM read them. Paper: https://t.co/dvYaKjWoXW @karpathy may be you will be also
21
98
611
@sayashk
Sayash Kapoor
2 months
๐Ÿ“ฃNew paper: Rigorous AI agent evaluation is much harder than it seems. For the last year, we have been working on infrastructure for fair agent evaluations on challenging benchmarks. Today, we release a paper that condenses our insights from 20,000+ agent rollouts on 9
20
101
426
@jiqizhixin
ๆœบๅ™จไน‹ๅฟƒ JIQIZHIXIN
2 months
Yuandong Tian, a Research Scientist Director at Meta FAIR, released a new paper addressing the question: Why do neural networks โ€œgrokโ€โ€”memorizing first, then suddenly generalizing? He proposes a mathematical framework that explains how and when features emerge during grokking in
5
86
433
@rohanpaul_ai
Rohan Paul
3 months
๐Ÿ‡จ๐Ÿ‡ณ Beautiful paper from Chinese Lab Tencent, on why Tool-Integrated Reasoning (TIR) enhances LLMs. And introduce a new algorithm, Advantage Shaping Policy Optimization (ASPO). ๐Ÿง  The idea Tools let an LLM run code, which expands its reachable reasoning space and compresses
5
84
415
@DongfuJiang
Dongfu Jiang
3 months
๐Ÿš€ Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of
@DongfuJiang
Dongfu Jiang
6 months
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and
2
37
158
@WenhuChen
Wenhu Chen
4 months
It seems that the famous MATH dataset has been taken down due to infringing AoPS's copyright. I guess we are not supposed to train or evaluate on any dataset that uses APoS's material. This will include AIME, AMC, and OmniMath, etc.
6
8
102
@ucsbNLP
UC Santa Barbara NLP Group
4 months
We did it! ๐ŸŽ‰ 12 papers from UCSB NLP accepted at #EMNLP2025 (7 Main + 5 Findings) Proud of everyoneโ€™s hard workโ€”poster below ๐Ÿ‘‡
0
10
41
@xywang626
Xinyuan Wang โœˆ๏ธ NeurIPS SD (3โ€“5) โ€ข SF (6โ€“9)
4 months
We are super excited to release OpenCUA โ€” the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. ๐Ÿ”— [Paper] https://t.co/naBIDnyvYY ๐Ÿ“Œ
14
104
468
@TonyCheng990417
Sitao Cheng
4 months
๐Ÿท๏ธIf you are still at #ACL2025 Vienna, don't miss our oral workshop presentation by @xunjian_yin (11:45-12:00). We study LLMs utilization of their parametric knowledge when various types of contextual knowledge is present. ๐Ÿ“•: https://t.co/EdVQiGQ2YU ๐Ÿ”—: https://t.co/aNLha3yunf
0
3
9
@xunjian_yin
Xunjian Yin@NeurIPS
4 months
โœˆ๏ธCurrently attending ACL #ACL2025 in Vienna, Austria. Will present at In-Person at Hall 4/5 (July 30, 10:30 - 12:00): ๐ŸšฉGรถdel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement Come and say hi!
4
5
21
@skyriver_2000
Ruiwen Zhou
4 months
๐Ÿš€๐Ÿš€ Catch me at ACL tomorrow (July 28) during the 11:00โ€“12:30 poster session! I will be presenting RuleArena, a challenging LLM benchmark for LLM reasoning under the guidance of complex real-world natural language rules. Come by and let's talk! ๐Ÿš€๐Ÿš€ #ACL2025 #LLMs #NLProc
@HuaWenyue31539
Wenyue Hua
1 year
๐Ÿš€๐Ÿš€Can #LLMs Handle Your Taxes? ๐Ÿ’ธ Thank you @skyriver_2000 for leading this very interesting project! He is applying for PhD program now :) Introducing RuleArena โ€“ a cutting-edge benchmark designed to test the logic reasoning of large language models with ~100 natural
0
2
4
@_avichawla
Avi Chawla
5 months
I have been training neural networks for 9 years now. Here are 16 ways I actively use to optimize model training:
25
200
2K
@TonyCheng990417
Sitao Cheng
5 months
Thanks Rohan for sharing our new paper! We release a foundation model "LEDOM" training reversely FROM SCRATCH. We analyse some interesting insights and explore applications of LEDOM in mathematical reasoning.
@rohanpaul_ai
Rohan Paul
5 months
Train forward, validate backward, end up with cleaner answers. LEDOM trains by predicting the previous token instead of the next, then uses that reverse view to judge and boost ordinary forward models. The study shows this backward reward notably tightens math answers. Forward
0
1
6
@TonyCheng990417
Sitao Cheng
6 months
Finally hit 100 citations. It's lotta hard work and lotta fun. Just a fresh start, let's go get more!
2
0
31
@_akhaliq
AK
6 months
When Models Know More Than They Can Explain Quantifying Knowledge Transfer in Human-AI Collaboration
3
47
180
@TonyCheng990417
Sitao Cheng
6 months
Very nice work by my bro @ard25974550 ! Check it out!
@xunjian_yin
Xunjian Yin@NeurIPS
6 months
Thrilled Gรถdel Agent got noticed by Sakana AI & excited for their expansion! Accepted at ACL 2025, our 1st fully self-referential agent can read & modify its entire logic (even that logic). Done via recursion. Paper:
1
0
2
@siyu_yuan_
Siyu Yuan
7 months
๐ŸŽ‰ Introducing our latest work โ€” Enigmata: A Full-Stack Recipe for Advancing Logical Reasoning in LLMs! Enigmata offers a complete pipeline from data generation โ†’ verification โ†’ RLVR training โ†’ evaluation, designed to systematically enhance the puzzle reasoning skills of LLMs.
11
46
267