Sitao Cheng @TonyCheng990417 X Profile

Sitao Cheng

@TonyCheng990417

Followers

239

Following

1K

Media

12

Statuses

148

Phd Student @UWCheritonCS. Interested in Reasoning, Language agents. Ex @ucsbNLP, @MSFTResearch, @NanjingUnivers1.

https://t.co/kAk24jWqFC

Joined September 2023

Don't wanna be here? Send us removal request.

Sitao Cheng

@TonyCheng990417

5 days

Thanks Rohan for sharing our paper! ⚖️We explore the hot dabate about the true nature of RL as a synthesizer or amplifier. 🚀We find that RL genuinly composes new skills but with a condition that the base model captures SUFFICIENT atomic abilities. 🔗

Rohan Paul

@rohanpaul_ai

5 days

The paper tests how reinforcement learning improves language model reasoning and finds it can build new skills from simple ones. Models trained only on the final task hit about 90% on familiar questions but drop to about 18% on new patterns. The authors build a synthetic world

0

3

6

Sitao Cheng

@TonyCheng990417

1 month

If you are in Suzhou for #EMNLP2025, don't miss our paper!

Sophia Xiao Pu @NeurIPS2025

@XiaoSophiaPu

1 month

OverBench is dynamic, scalable, and evolving. We hope it enables safer and more helpful LLMs! 📄 Paper: https://t.co/u7TZm13dc1 💾 Dataset: https://t.co/yBEC56jwy3 Thanks to all collaborators: @TonyCheng990417 @xwang_lk @WilliamWangNLP

0

3

Victor Zhong

@hllo_wrld

1 month

I am hiring for fully funded (up to 3 years) postdoc positions in AI for science at Waterloo/Vector: multimodal deep research, agents, tool-use. You'll work closely w/ industry partners & lead projects. Please share! Apply at https://t.co/ShYRDnnjIu or email me directly!

6

81

363

Xiao Liu (Shaw)

@ShawLiu12

2 months

Why feed 1M tokens when ~250k visual tokens do? 🚀👀 Concurrent to DeepSeek-OCR, today we’re releasing Glyph, a visual-text compression paradigm that turns long text into images and lets a VLM read them. Paper: https://t.co/dvYaKjWoXW @karpathy may be you will be also

21

98

611

Sayash Kapoor

@sayashk

2 months

📣New paper: Rigorous AI agent evaluation is much harder than it seems. For the last year, we have been working on infrastructure for fair agent evaluations on challenging benchmarks. Today, we release a paper that condenses our insights from 20,000+ agent rollouts on 9

20

101

426

机器之心 JIQIZHIXIN

@jiqizhixin

2 months

Yuandong Tian, a Research Scientist Director at Meta FAIR, released a new paper addressing the question: Why do neural networks “grok”—memorizing first, then suddenly generalizing? He proposes a mathematical framework that explains how and when features emerge during grokking in

5

86

433

Rohan Paul

@rohanpaul_ai

3 months

🇨🇳 Beautiful paper from Chinese Lab Tencent, on why Tool-Integrated Reasoning (TIR) enhances LLMs. And introduce a new algorithm, Advantage Shaping Policy Optimization (ASPO). 🧠 The idea Tools let an LLM run code, which expands its reachable reasoning space and compresses

5

84

415

Dongfu Jiang

@DongfuJiang

3 months

🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of

Dongfu Jiang

@DongfuJiang

6 months

Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and

2

37

158

Wenhu Chen

@WenhuChen

4 months

It seems that the famous MATH dataset has been taken down due to infringing AoPS's copyright. I guess we are not supposed to train or evaluate on any dataset that uses APoS's material. This will include AIME, AMC, and OmniMath, etc.

6

8

102

UC Santa Barbara NLP Group

@ucsbNLP

4 months

We did it! 🎉 12 papers from UCSB NLP accepted at #EMNLP2025 (7 Main + 5 Findings) Proud of everyone’s hard work—poster below 👇

0

10

41

Xinyuan Wang ✈️ NeurIPS SD (3–5) • SF (6–9)

@xywang626

4 months

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] https://t.co/naBIDnyvYY 📌

14

104

468

Sitao Cheng

@TonyCheng990417

4 months

🏷️If you are still at #ACL2025 Vienna, don't miss our oral workshop presentation by @xunjian_yin (11:45-12:00). We study LLMs utilization of their parametric knowledge when various types of contextual knowledge is present. 📕: https://t.co/EdVQiGQ2YU 🔗: https://t.co/aNLha3yunf

0

3

9

Xunjian Yin@NeurIPS

@xunjian_yin

4 months

✈️Currently attending ACL #ACL2025 in Vienna, Austria. Will present at In-Person at Hall 4/5 (July 30, 10:30 - 12:00): 🚩Gödel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement Come and say hi!

4

5

21

Ruiwen Zhou

@skyriver_2000

4 months

🚀🚀 Catch me at ACL tomorrow (July 28) during the 11:00–12:30 poster session! I will be presenting RuleArena, a challenging LLM benchmark for LLM reasoning under the guidance of complex real-world natural language rules. Come by and let's talk! 🚀🚀 #ACL2025 #LLMs #NLProc

Wenyue Hua

@HuaWenyue31539

1 year

🚀🚀Can #LLMs Handle Your Taxes? 💸 Thank you @skyriver_2000 for leading this very interesting project! He is applying for PhD program now :) Introducing RuleArena – a cutting-edge benchmark designed to test the logic reasoning of large language models with ~100 natural

0

2

4

Avi Chawla

@_avichawla

5 months

I have been training neural networks for 9 years now. Here are 16 ways I actively use to optimize model training:

25

200

2K

Sitao Cheng

@TonyCheng990417

5 months

Thanks Rohan for sharing our new paper! We release a foundation model "LEDOM" training reversely FROM SCRATCH. We analyse some interesting insights and explore applications of LEDOM in mathematical reasoning.

Rohan Paul

@rohanpaul_ai

5 months

Train forward, validate backward, end up with cleaner answers. LEDOM trains by predicting the previous token instead of the next, then uses that reverse view to judge and boost ordinary forward models. The study shows this backward reward notably tightens math answers. Forward

0

1

6

Sitao Cheng

@TonyCheng990417

6 months

Finally hit 100 citations. It's lotta hard work and lotta fun. Just a fresh start, let's go get more!

2

0

31

AK

@_akhaliq

6 months

When Models Know More Than They Can Explain Quantifying Knowledge Transfer in Human-AI Collaboration

3

47

180

Sitao Cheng

@TonyCheng990417

6 months

Very nice work by my bro @ard25974550 ! Check it out!

Xunjian Yin@NeurIPS

@xunjian_yin

6 months

Thrilled Gödel Agent got noticed by Sakana AI & excited for their expansion! Accepted at ACL 2025, our 1st fully self-referential agent can read & modify its entire logic (even that logic). Done via recursion. Paper:

1

0

2

Siyu Yuan

@siyu_yuan_

7 months

🎉 Introducing our latest work — Enigmata: A Full-Stack Recipe for Advancing Logical Reasoning in LLMs! Enigmata offers a complete pipeline from data generation → verification → RLVR training → evaluation, designed to systematically enhance the puzzle reasoning skills of LLMs.

11

46

267