Zhiding Yu @ZhidingYu X Profile

Zhiding Yu

@ZhidingYu

Followers

8K

Following

363

Media

32

Statuses

179

Working to make machines understand the world like human beings. Words are my own.

Santa Clara

Joined July 2020

Don't wanna be here? Send us removal request.

Zhiding Yu

@ZhidingYu

5 months

Thank you AK!. Excited to introduce Eagle 2.5, NVIDIA’s latest vision-language model that brings strong long-context capabilities across both image and video understanding — all with just 8B parameters. Most existing VLMs struggle with high-res inputs and long video contexts.

Aran Komatsuzaki

@arankomatsuzaki

5 months

Nvidia presents Eagle 2.5!. - A family of frontier VLMs for long-context multimodal learning.- Eagle 2.5-8B matches the results of GPT-4o and Qwen2.5-VL-72B on long-video understanding

1

10

49

Zhiding Yu

@ZhidingYu

19 days

RT @wonmin_byeon: 🚀 New paper: STORM — Efficient VLM for Long Video Understanding. STORM cuts compute costs by up to 8× and reduces decodin….

0

26

0

Grok

@grok

20 days

Join millions who have switched to Grok.

664

774

6K

Zhiding Yu

@ZhidingYu

1 month

RT @cihangxie: 🚀 Excited to share GPT-Image-Edit-1.5M — our new large-scale, high-quality, fully open image editing dataset for the researc….

0

50

0

Zhiding Yu

@ZhidingYu

2 months

RT @shizhediao: New tech report out! 🚀.Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training.An expanded version of our….

arxiv.org

Recent advancements in reasoning-focused language models such as OpenAI's O1 and DeepSeek-R1 have shown that scaling test-time computation-through chain-of-thought reasoning and iterative...

0

14

0

Zhiding Yu

@ZhidingYu

2 months

RT @FuEnYang1: 🤖 How can we teach embodied agents to think before they act?. 🚀 Introducing ThinkAct — a hierarchical Reasoning VLA framewor….

0

26

0

Zhiding Yu

@ZhidingYu

2 months

And today we have just opened sourced the Eagle 2.5 model.You are welcome to download and give a try!.We will also open source the fine-tuning code for Eagle 2/2.5 soon at Stay tuned.

Zhiding Yu

@ZhidingYu

2 months

I did not notice this until just now. Thank you @andimarafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.

1

6

46

Zhiding Yu

@ZhidingYu

2 months

I did not notice this until just now. Thank you @andimarafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.

Andi Marafioti

@andimarafioti

3 months

The Eagle 2 paper from Nvidia is such a goldmine.

1

3

18

Zhiding Yu

@ZhidingYu

3 months

Come to the T4V Workshop this Thursday (June 12th) and check the latest development in Transformers!.

Min-Hung (Steve) Chen

@CMHungSteven

3 months

@CVPR is around the corner!!.Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics!. Website: #CVPR #Transformer #Vision #T4V2025 #T4V

0

2

18

Zhiding Yu

@ZhidingYu

3 months

Document and Enterprise Intelligence is arguably one of the most important applications of VLMs and cloud services. NVIDIA VLM technologies help to build commercial grade models excelling in this area. The Eagle VLM Team, together with other colleagues at NVIDIA, are proud to be.

NVIDIA AI Developer

@NVIDIAAIDev

3 months

🥇Our NVIDIA Llama Nemotron Nano VL model is #1 on the OCRBench V2 leaderboard. Designed for advanced intelligent document processing and understanding, this model extracts diverse info from complex documents with precision, all on a single GPU. 📗 Get the technical details

0

3

17

Zhiding Yu

@ZhidingYu

3 months

RT @shizhediao: Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training….

0

67

0

Zhiding Yu

@ZhidingYu

4 months

RT @rohanpaul_ai: Cool paper from @nvidia. Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting….

0

44

0

Zhiding Yu

@ZhidingYu

4 months

Check this super cool work done by our intern @ShaokunZhang1 - RL + Tool Using is the future of LLM Agent!. Before joining NVIDIA, Shaokun was a contributor of the famous multi-agent workflow framework #AutoGen. Now, the age of agent learning is coming beyond workflow control!.

Shaokun Zhang

@ShaokunZhang1

4 months

Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: 💻

1

4

40

Zhiding Yu

@ZhidingYu

4 months

RT @CMHungSteven: The 4th Workshop on Transformers for Vision (T4V) at CVPR 2025 is soliciting self-nominations for reviewers. If you're in….

0

11

0

Zhiding Yu

@ZhidingYu

4 months

“仗打完了，他们赚什么”.

渡边君

@JiaweiShen2568

4 months

《投名状》是部极品神片.神片都有一个毛病.就是在上映时票房不怎么火.因为太复杂很多人没看懂.但时间深沉下来越来越火.仨老头可比晚清留下来的照片画像要吓人多了.这片子艺术成分很高.三四层楼那么高

0

3

Zhiding Yu

@ZhidingYu

4 months

Congrats @angli_ai and team!.

Simular

@SimularAI

4 months

The Simular team is proud to share:. 🎉 𝗔𝗴𝗲𝗻𝘁 𝗦 has won the 𝗕𝗲𝘀𝘁 𝗣𝗮𝗽𝗲𝗿 𝗔𝘄𝗮𝗿𝗱 at the Agentic AI for Science Workshop at #ICLR2025 @iclr_conf! 🎉 . It’s the first open-source computer-use agent, and the first to surpass 20% on OSWorld at at the time of its

1

0

4

Zhiding Yu

@ZhidingYu

5 months

If you are interested, do not hesitate to DM us, or come to our poster!.

0

2

Zhiding Yu

@ZhidingYu

5 months

[9/9] Strong Image task performance . Eagle 2.5 shows consistent improvement over Eagle 2 thanks to the better vision encoder and mixed image-video training. An interesting observation here is that joint training with video also helps image understanding, which echoes the need to

1

0

2

Zhiding Yu

@ZhidingYu

5 months

[8/9] Excellent long-context scaling. While certain public models show diminishing gain or even decreased results over longer inputs, Eagle-2.5 benefits from increased input length, leading to consistent improvement.

1

0

1