Yihe__Deng Profile Banner
Yihe Deng Profile
Yihe Deng

@Yihe__Deng

Followers
3K
Following
2K
Media
34
Statuses
191

multimodal @xAI @GoogleDeepMind @GoogleAI @MSFTResearch @AWS @UCLA

Joined November 2021
Don't wanna be here? Send us removal request.
@Yihe__Deng
Yihe Deng
4 months
🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles Our OpenVLThinker-v1.2 is trained through three lightweight SFT → RL cycles, where SFT first “highlights” reasoning behaviors and RL then explores and
2
47
210
@Yihe__Deng
Yihe Deng
12 days
I haven’t had the chance to share, but I closed a chapter with my internship at @Google and @GoogleDeepMind a few weeks ago. It was a great learning experience, and our previous project is now on arXiv: https://t.co/lxIUjnGApl
26
13
639
@Siru_Ouyang
Siru Ouyang
1 month
Thanks for sharing our work @arankomatsuzaki ! Really excited about how reasoning-based memory drives and scales for self-evolving agents 💫 : 🏬ReasoningBank stores insights from both successful and failure trajectories; 🛠️ MaTTS builds on this powerful experience learner, and
@arankomatsuzaki
Aran Komatsuzaki
1 month
ReasoningBank: memory for self-evolving LLM agents • Distills strategies from both successes & failures • Enables agents to learn, reuse, and improve over time • Outperforms prior memory methods on web & SWE tasks (+34.2% eff., –16% steps)
5
32
196
@Yihe__Deng
Yihe Deng
2 months
Accepted to #NeurIPS2025 ! Look forward to meeting everyone in San Diego 🎉
@Yihe__Deng
Yihe Deng
4 months
🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles Our OpenVLThinker-v1.2 is trained through three lightweight SFT → RL cycles, where SFT first “highlights” reasoning behaviors and RL then explores and
2
5
124
@Yihe__Deng
Yihe Deng
4 months
Importantly, we analyze how SFT and RL influence reasoning keywords: one SFT pass surfaces latent cues like “first,” “wait,” and “check,” while a follow-up RL boosts performance (+5.2 pts on MathVista) with hardly any new keyword shifts: SFT surfaces the actions, RL polishes
0
2
8
@Yong18850571
Yong Lin
4 months
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B
9
92
262
@Yihe__Deng
Yihe Deng
4 months
Our paper "Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance" will be presented as a spotlight at ICML! I won't make it to Vancouver, but please say hi to my co-author @linxizhao4 there :) https://t.co/bti5XDE8ga
@linxizhao4
Linxi Zhao
4 months
I’ll be at #ICML2025 in Vancouver this week. Really looking forward to meeting and learning from everyone! I'll be presenting our spotlight paper at 11am on Wed, July 16, in East Exhibition Hall A-B: "Mitigating Object Hallucination in Large Vision-Language Models via
0
2
7
@linxizhao4
Linxi Zhao
4 months
I’ll be at #ICML2025 in Vancouver this week. Really looking forward to meeting and learning from everyone! I'll be presenting our spotlight paper at 11am on Wed, July 16, in East Exhibition Hall A-B: "Mitigating Object Hallucination in Large Vision-Language Models via
0
2
9
@linxizhao4
Linxi Zhao
6 months
🚀Excited to share our latest work: LLMs entangle language and knowledge, making it hard to verify or update facts. We introduce LMLM 🐑🧠 — a new class of models that externalize factual knowledge into a database and learn during pretraining when and how to retrieve facts
1
15
44
@siyan_zhao
Siyan Zhao
7 months
Introducing d1🚀 — the first framework that applies reinforcement learning to improve reasoning in masked diffusion LLMs (dLLMs). Combining masked SFT with a novel form of policy gradient algorithm, d1 significantly boosts the performance of pretrained dLLMs like LLaDA.
8
107
575
@Yihe__Deng
Yihe Deng
8 months
Thanks @_akhaliq for sharing our work!
@_akhaliq
AK
8 months
OpenVLThinker An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
0
2
27
@Yihe__Deng
Yihe Deng
8 months
🗞️Arxiv:
@Yihe__Deng
Yihe Deng
8 months
🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%
0
11
73
@Yihe__Deng
Yihe Deng
8 months
Huge thanks to my collaborators @hbXNov, @FanYin63689862, @VioletNPeng, @WeiWang1973 and @kaiwei_chang! 🙌
0
0
2
@Yihe__Deng
Yihe Deng
8 months
🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%
3
39
172
@Yihe__Deng
Yihe Deng
8 months
🤖 I just updated my repository of RL(HF) summary notes to include a growing exploration of new topics, specifically adding notes to projects related to DeepSeek R1 reasoning. Take a look: https://t.co/cPiLWpOkJw 🚀 I’m hoping these summaries are helpful, and I’d love to hear
@Yihe__Deng
Yihe Deng
1 year
😄I did a brief intro of RLHF algorithms for the reading group presentation of our lab. It was a good learning experience for me and I want to share the github repo here holds the slides as well as the list of interesting papers: https://t.co/TFIcpwUqul Would love to hear about
1
13
99
@EdwardSun0909
Zhiqing Sun
9 months
We’re rolling out Deep Research to Plus users today! Deep Research was the biggest “Feel The AGI” moment I’ve ever had since ChatGPT. And I’m glad more people will experience their first AGI moment! The team also worked super hard to make more tools including image citations /
@OpenAI
OpenAI
9 months
We're also sharing the system card, detailing how we built deep research, assessed its capabilities and risks, and improved safety.
26
31
494
@siyan_zhao
Siyan Zhao
9 months
Excited to release PrefEval (ICLR '25 Oral), a benchmark for evaluating LLMs’ ability to infer, memorize, and adhere to user preferences in long-context conversations! ⚠️We find that cutting-edge LLMs struggle to follow user preferences—even in short contexts. This isn't just
3
27
133
@GeZhang86038849
Ge Zhang
9 months
[1/n] SuperExcited to announce SuperGPQA!!! We spend more than half a year to finally make it done! SuperGPQA is a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. It also provides the largest human-LLM
5
51
216
@ZiniuLi
Ziniu Li
9 months
🌟 Can better cold start strategies improve RL training for LLMs? 🤖 I’ve written a blog that delves into the challenges of fine-tuning LLMs during the cold-start phase and how the strategies applied there can significantly impact RL performance in complex reasoning tasks that
3
35
168