fly51fly @fly51fly X Profile

fly51fly

@fly51fly

Followers

8K

Following

77K

Media

11K

Statuses

25K

BUPT prof | Sharing latest AI papers & insights | Join me in embracing the AI revolution! #MachineLearning #AI #Innovation

Joined February 2009

Don't wanna be here? Send us removal request.

fly51fly

@fly51fly

16 minutes

[LG] Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning.A Golubev, M Trofimova, S Polezhaev, I Badertdinov. [Nebius AI] (2025).

1

0

1

fly51fly

@fly51fly

31 minutes

[LG] Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction.Y Lin, S Tang, B Lyu, Z Yang. [Princeton University & Tsinghua University] (2025).

0

1

fly51fly

@fly51fly

41 minutes

[CL] CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward.S Liu, H Liu, J Liu, L Xiao. [Shanghai AI Laboratory] (2025).

0

1

2

fly51fly

@fly51fly

57 minutes

[LG] Perch 2.0: The Bittern Lesson for Bioacoustics.B v Merriënboer, V Dumoulin, J Hamer, L Harrell. [Google DeepMind] (2025).

0

2

1

fly51fly

@fly51fly

1 hour

[CL] A comprehensive taxonomy of hallucinations in Large Language Models.M Cossio [Universitat de Barcelona] (2025).

0

1

3

fly51fly

@fly51fly

1 day

[LG] Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning.X Huang, M Hahn [Saarland University] (2025).

1

6

19

fly51fly

@fly51fly

1 day

[CL] R-Zero: Self-Evolving Reasoning LLM from Zero Data.C Huang, W Yu, X Wang, H Zhang. [Tencent AI Seattle Lab] (2025).

0

3

9

fly51fly

@fly51fly

1 day

[CL] Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs.N Ziems, D Soylu, L A Agrawal, I Miller. [University of Notre Dame & Stanford University & UC Berkeley] (2025).

0

5

14

fly51fly

@fly51fly

1 day

[CL] Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference.Y Song, Z Zhang, C Luo, P Gao. [ByteDance Seed & Tsinghua University] (2025).

0

6

19

fly51fly

@fly51fly

1 day

[LG] Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens.C Zhao, Z Tan, P Ma, D Li. [Arizona State University,] (2025).

0

3

27

fly51fly

@fly51fly

2 days

[CL] Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression.J Huang, B Lin, G Feng, J Chen. [Peking University & The Hong Kong University of Science and Technology] (2025).

2

8

fly51fly

@fly51fly

2 days

[LG] GRAIL:Learning to Interact with Large Knowledge Graphs for Retrieval Augmented Reasoning.G Chang, J Su, J Liu, P Yang. [Tsinghua University] (2025).

0

3

10

fly51fly

@fly51fly

2 days

[LG] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.Y Wu, Y Zhou, Z Ziheng, Y Peng. [Southeast University & University of California, Los Angeles] (2025).

1

15

85

fly51fly

@fly51fly

2 days

[CL] MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy.S Zhan, Y Lai, Z Lu, D Lin. [Tsinghua University] (2025).

0

2

13

fly51fly

@fly51fly

2 days

[CL] Learning to Reason for Factuality.X Chen, I Kulikov, V Berges, B Oğuz. [FAIR at Meta] (2025).

0

4

27

fly51fly

@fly51fly

3 days

[LG] OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use .

0

5

16

fly51fly

@fly51fly

3 days

[CL] Sotopia-RL: Reward Design for Social Intelligence.H Yu, Z Qi, Y Zhao, K Nottingham. [University of Illinois Urbana-Champaign & CMU] (2025).

0

1

11

fly51fly

@fly51fly

3 days

[CL] CoAct-1: Computer-using Agents with Coding as Actions.L Song, Y Dai, V Prabhu, J Zhang. [University of Southern California & Salesforce Research] (2025).