Yihe Deng
@Yihe__Deng
Followers
3K
Following
2K
Media
34
Statuses
191
multimodal @xAI @GoogleDeepMind @GoogleAI @MSFTResearch @AWS @UCLA
Joined November 2021
🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles Our OpenVLThinker-v1.2 is trained through three lightweight SFT → RL cycles, where SFT first “highlights” reasoning behaviors and RL then explores and
2
47
210
I haven’t had the chance to share, but I closed a chapter with my internship at @Google and @GoogleDeepMind a few weeks ago. It was a great learning experience, and our previous project is now on arXiv: https://t.co/lxIUjnGApl
26
13
639
Thanks for sharing our work @arankomatsuzaki ! Really excited about how reasoning-based memory drives and scales for self-evolving agents 💫 : 🏬ReasoningBank stores insights from both successful and failure trajectories; 🛠️ MaTTS builds on this powerful experience learner, and
ReasoningBank: memory for self-evolving LLM agents • Distills strategies from both successes & failures • Enables agents to learn, reuse, and improve over time • Outperforms prior memory methods on web & SWE tasks (+34.2% eff., –16% steps)
5
32
196
Accepted to #NeurIPS2025 ! Look forward to meeting everyone in San Diego 🎉
🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles Our OpenVLThinker-v1.2 is trained through three lightweight SFT → RL cycles, where SFT first “highlights” reasoning behaviors and RL then explores and
2
5
124
Importantly, we analyze how SFT and RL influence reasoning keywords: one SFT pass surfaces latent cues like “first,” “wait,” and “check,” while a follow-up RL boosts performance (+5.2 pts on MathVista) with hardly any new keyword shifts: SFT surfaces the actions, RL polishes
0
2
8
Our paper "Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance" will be presented as a spotlight at ICML! I won't make it to Vancouver, but please say hi to my co-author @linxizhao4 there :) https://t.co/bti5XDE8ga
I’ll be at #ICML2025 in Vancouver this week. Really looking forward to meeting and learning from everyone! I'll be presenting our spotlight paper at 11am on Wed, July 16, in East Exhibition Hall A-B: "Mitigating Object Hallucination in Large Vision-Language Models via
0
2
7
I’ll be at #ICML2025 in Vancouver this week. Really looking forward to meeting and learning from everyone! I'll be presenting our spotlight paper at 11am on Wed, July 16, in East Exhibition Hall A-B: "Mitigating Object Hallucination in Large Vision-Language Models via
0
2
9
🚀Excited to share our latest work: LLMs entangle language and knowledge, making it hard to verify or update facts. We introduce LMLM 🐑🧠 — a new class of models that externalize factual knowledge into a database and learn during pretraining when and how to retrieve facts
1
15
44
Introducing d1🚀 — the first framework that applies reinforcement learning to improve reasoning in masked diffusion LLMs (dLLMs). Combining masked SFT with a novel form of policy gradient algorithm, d1 significantly boosts the performance of pretrained dLLMs like LLaDA.
8
107
575
Thanks @_akhaliq for sharing our work!
OpenVLThinker An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
0
2
27
Huge thanks to my collaborators @hbXNov, @FanYin63689862, @VioletNPeng, @WeiWang1973 and @kaiwei_chang! 🙌
0
0
2
🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%
3
39
172
🤖 I just updated my repository of RL(HF) summary notes to include a growing exploration of new topics, specifically adding notes to projects related to DeepSeek R1 reasoning. Take a look: https://t.co/cPiLWpOkJw 🚀 I’m hoping these summaries are helpful, and I’d love to hear
😄I did a brief intro of RLHF algorithms for the reading group presentation of our lab. It was a good learning experience for me and I want to share the github repo here holds the slides as well as the list of interesting papers: https://t.co/TFIcpwUqul Would love to hear about
1
13
99
We’re rolling out Deep Research to Plus users today! Deep Research was the biggest “Feel The AGI” moment I’ve ever had since ChatGPT. And I’m glad more people will experience their first AGI moment! The team also worked super hard to make more tools including image citations /
We're also sharing the system card, detailing how we built deep research, assessed its capabilities and risks, and improved safety.
26
31
494
Excited to release PrefEval (ICLR '25 Oral), a benchmark for evaluating LLMs’ ability to infer, memorize, and adhere to user preferences in long-context conversations! ⚠️We find that cutting-edge LLMs struggle to follow user preferences—even in short contexts. This isn't just
3
27
133
[1/n] SuperExcited to announce SuperGPQA!!! We spend more than half a year to finally make it done! SuperGPQA is a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. It also provides the largest human-LLM
5
51
216
🌟 Can better cold start strategies improve RL training for LLMs? 🤖 I’ve written a blog that delves into the challenges of fine-tuning LLMs during the cold-start phase and how the strategies applied there can significantly impact RL performance in complex reasoning tasks that
3
35
168