Yihe Deng @Yihe__Deng X Profile

Yihe Deng

@Yihe__Deng

Followers

3K

Following

2K

Media

34

Statuses

191

multimodal @xAI @GoogleDeepMind @GoogleAI @MSFTResearch @AWS @UCLA

https://t.co/fRgMZFKBup

Joined November 2021

Don't wanna be here? Send us removal request.

Yihe Deng

@Yihe__Deng

4 months

🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles Our OpenVLThinker-v1.2 is trained through three lightweight SFT → RL cycles, where SFT first “highlights” reasoning behaviors and RL then explores and

2

47

210

Yihe Deng

@Yihe__Deng

12 days

I haven’t had the chance to share, but I closed a chapter with my internship at @Google and @GoogleDeepMind a few weeks ago. It was a great learning experience, and our previous project is now on arXiv: https://t.co/lxIUjnGApl

26

13

639

Siru Ouyang

@Siru_Ouyang

1 month

Thanks for sharing our work @arankomatsuzaki ! Really excited about how reasoning-based memory drives and scales for self-evolving agents 💫 : 🏬ReasoningBank stores insights from both successful and failure trajectories; 🛠️ MaTTS builds on this powerful experience learner, and

Aran Komatsuzaki

@arankomatsuzaki

1 month

ReasoningBank: memory for self-evolving LLM agents • Distills strategies from both successes & failures • Enables agents to learn, reuse, and improve over time • Outperforms prior memory methods on web & SWE tasks (+34.2% eff., –16% steps)

5

32

196

Yihe Deng

@Yihe__Deng

2 months

Accepted to #NeurIPS2025 ! Look forward to meeting everyone in San Diego 🎉

Yihe Deng

@Yihe__Deng

4 months

🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles Our OpenVLThinker-v1.2 is trained through three lightweight SFT → RL cycles, where SFT first “highlights” reasoning behaviors and RL then explores and

2

5

124

Yihe Deng

@Yihe__Deng

4 months

Importantly, we analyze how SFT and RL influence reasoning keywords: one SFT pass surfaces latent cues like “first,” “wait,” and “check,” while a follow-up RL boosts performance (+5.2 pts on MathVista) with hardly any new keyword shifts: SFT surfaces the actions, RL polishes

0

2

8

Yong Lin

@Yong18850571

4 months

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B

9

92

262

Yihe Deng

@Yihe__Deng

4 months

Our paper "Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance" will be presented as a spotlight at ICML! I won't make it to Vancouver, but please say hi to my co-author @linxizhao4 there :) https://t.co/bti5XDE8ga

Linxi Zhao

@linxizhao4

4 months

I’ll be at #ICML2025 in Vancouver this week. Really looking forward to meeting and learning from everyone! I'll be presenting our spotlight paper at 11am on Wed, July 16, in East Exhibition Hall A-B: "Mitigating Object Hallucination in Large Vision-Language Models via

0

2

7

Linxi Zhao

@linxizhao4

4 months

I’ll be at #ICML2025 in Vancouver this week. Really looking forward to meeting and learning from everyone! I'll be presenting our spotlight paper at 11am on Wed, July 16, in East Exhibition Hall A-B: "Mitigating Object Hallucination in Large Vision-Language Models via

0

2

9

Linxi Zhao

@linxizhao4

6 months

🚀Excited to share our latest work: LLMs entangle language and knowledge, making it hard to verify or update facts. We introduce LMLM 🐑🧠 — a new class of models that externalize factual knowledge into a database and learn during pretraining when and how to retrieve facts

1

15

44

Siyan Zhao

@siyan_zhao

7 months

Introducing d1🚀 — the first framework that applies reinforcement learning to improve reasoning in masked diffusion LLMs (dLLMs). Combining masked SFT with a novel form of policy gradient algorithm, d1 significantly boosts the performance of pretrained dLLMs like LLaDA.

8

107

575

Yihe Deng

@Yihe__Deng

8 months

Thanks @_akhaliq for sharing our work!

AK

@_akhaliq

8 months

OpenVLThinker An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

0

2

27

Yihe Deng

@Yihe__Deng

8 months

🗞️Arxiv:

Yihe Deng

@Yihe__Deng

8 months

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%

0

11

73

Yihe Deng

@Yihe__Deng

8 months

Huge thanks to my collaborators @hbXNov, @FanYin63689862, @VioletNPeng, @WeiWang1973 and @kaiwei_chang! 🙌

0

2

Yihe Deng

@Yihe__Deng

8 months

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%

3

39

172

Yihe Deng

@Yihe__Deng

8 months

🤖 I just updated my repository of RL(HF) summary notes to include a growing exploration of new topics, specifically adding notes to projects related to DeepSeek R1 reasoning. Take a look: https://t.co/cPiLWpOkJw 🚀 I’m hoping these summaries are helpful, and I’d love to hear

Yihe Deng

@Yihe__Deng

1 year

😄I did a brief intro of RLHF algorithms for the reading group presentation of our lab. It was a good learning experience for me and I want to share the github repo here holds the slides as well as the list of interesting papers: https://t.co/TFIcpwUqul Would love to hear about

1

13

99

Zhiqing Sun

@EdwardSun0909

9 months

We’re rolling out Deep Research to Plus users today! Deep Research was the biggest “Feel The AGI” moment I’ve ever had since ChatGPT. And I’m glad more people will experience their first AGI moment! The team also worked super hard to make more tools including image citations /

OpenAI

@OpenAI

9 months

We're also sharing the system card, detailing how we built deep research, assessed its capabilities and risks, and improved safety.

26

31

494

Siyan Zhao

@siyan_zhao

9 months

Excited to release PrefEval (ICLR '25 Oral), a benchmark for evaluating LLMs’ ability to infer, memorize, and adhere to user preferences in long-context conversations! ⚠️We find that cutting-edge LLMs struggle to follow user preferences—even in short contexts. This isn't just

3

27

133

Ge Zhang

@GeZhang86038849

9 months

[1/n] SuperExcited to announce SuperGPQA!!! We spend more than half a year to finally make it done! SuperGPQA is a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. It also provides the largest human-LLM

5

51

216

Ziniu Li

@ZiniuLi

9 months

🌟 Can better cold start strategies improve RL training for LLMs? 🤖 I’ve written a blog that delves into the challenges of fine-tuning LLMs during the cold-start phase and how the strategies applied there can significantly impact RL performance in complex reasoning tasks that

3

35

168