
Zhiding Yu
@ZhidingYu
Followers
8K
Following
363
Media
32
Statuses
179
Working to make machines understand the world like human beings. Words are my own.
Santa Clara
Joined July 2020
Thank you AK!. Excited to introduce Eagle 2.5, NVIDIA’s latest vision-language model that brings strong long-context capabilities across both image and video understanding — all with just 8B parameters. Most existing VLMs struggle with high-res inputs and long video contexts.
Nvidia presents Eagle 2.5!. - A family of frontier VLMs for long-context multimodal learning.- Eagle 2.5-8B matches the results of GPT-4o and Qwen2.5-VL-72B on long-video understanding
1
10
49
RT @wonmin_byeon: 🚀 New paper: STORM — Efficient VLM for Long Video Understanding. STORM cuts compute costs by up to 8× and reduces decodin….
0
26
0
RT @cihangxie: 🚀 Excited to share GPT-Image-Edit-1.5M — our new large-scale, high-quality, fully open image editing dataset for the researc….
0
50
0
RT @shizhediao: New tech report out! 🚀.Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training.An expanded version of our….
arxiv.org
Recent advancements in reasoning-focused language models such as OpenAI's O1 and DeepSeek-R1 have shown that scaling test-time computation-through chain-of-thought reasoning and iterative...
0
14
0
RT @FuEnYang1: 🤖 How can we teach embodied agents to think before they act?. 🚀 Introducing ThinkAct — a hierarchical Reasoning VLA framewor….
0
26
0
And today we have just opened sourced the Eagle 2.5 model.You are welcome to download and give a try!.We will also open source the fine-tuning code for Eagle 2/2.5 soon at Stay tuned.
I did not notice this until just now. Thank you @andimarafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.
1
6
46
I did not notice this until just now. Thank you @andimarafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.
1
3
18
Come to the T4V Workshop this Thursday (June 12th) and check the latest development in Transformers!.
@CVPR is around the corner!!.Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yang4ai, @WenhuChen, @roeiherzig, Yuheng Li, Kristen Grauman) covering diverse topics!. Website: #CVPR #Transformer #Vision #T4V2025 #T4V
0
2
18
Document and Enterprise Intelligence is arguably one of the most important applications of VLMs and cloud services. NVIDIA VLM technologies help to build commercial grade models excelling in this area. The Eagle VLM Team, together with other colleagues at NVIDIA, are proud to be.
🥇Our NVIDIA Llama Nemotron Nano VL model is #1 on the OCRBench V2 leaderboard. Designed for advanced intelligent document processing and understanding, this model extracts diverse info from complex documents with precision, all on a single GPU. 📗 Get the technical details
0
3
17
RT @shizhediao: Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training….
0
67
0
RT @rohanpaul_ai: Cool paper from @nvidia. Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting….
0
44
0
Check this super cool work done by our intern @ShaokunZhang1 - RL + Tool Using is the future of LLM Agent!. Before joining NVIDIA, Shaokun was a contributor of the famous multi-agent workflow framework #AutoGen. Now, the age of agent learning is coming beyond workflow control!.
Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: 💻
1
4
40
RT @CMHungSteven: The 4th Workshop on Transformers for Vision (T4V) at CVPR 2025 is soliciting self-nominations for reviewers. If you're in….
0
11
0
Congrats @angli_ai and team!.
The Simular team is proud to share:. 🎉 𝗔𝗴𝗲𝗻𝘁 𝗦 has won the 𝗕𝗲𝘀𝘁 𝗣𝗮𝗽𝗲𝗿 𝗔𝘄𝗮𝗿𝗱 at the Agentic AI for Science Workshop at #ICLR2025 @iclr_conf! 🎉 . It’s the first open-source computer-use agent, and the first to surpass 20% on OSWorld at at the time of its
1
0
4
If you are interested, do not hesitate to DM us, or come to our poster!.
0
0
2
[9/9] Strong Image task performance . Eagle 2.5 shows consistent improvement over Eagle 2 thanks to the better vision encoder and mixed image-video training. An interesting observation here is that joint training with video also helps image understanding, which echoes the need to
1
0
2
[8/9] Excellent long-context scaling. While certain public models show diminishing gain or even decreased results over longer inputs, Eagle-2.5 benefits from increased input length, leading to consistent improvement.
1
0
1