
Yao Lu
@Yao__Lu
Followers
229
Following
37
Media
0
Statuses
32
Distinguished Research Scientist @Nvidia | Ex-Research Manager @GoogleDeempind
Joined October 2022
RT @Guangxuan_Xiao: I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our….
0
264
0
RT @yukangchen_: 🚀 We open-sourced the LongVILA-R1 model, Long-RL codebase, and an online demo!. 🔍 LongVILA-R1-7B supports both multi-cho….
0
19
0
RT @LigengZhu: Empowered by SGLang, NVILA serving now has 4.4x throughput and 2.2x faster response 🚀🚀🚀. Awesome work made by @AndyZijianZh….
0
3
0
RT @yukangchen_: Video understanding isn't just recognizing —it demands reasoning across thousands of frames. Meet Long-RL🚀 Highlights:.🧠….
0
53
0
RT @zhuoyang_zhang: 🚀Check out #LPD - our latest work to accelerate autoregressive image generation. LPD stands for Locality-aware Parallel….
0
19
0
RT @jxwuyi: We release fully async RL system AReaL-boba² for LLM & SOTA code RL w. Qwen3-14B! @Alibaba_Qwen #opensource .🚀system&algorithm….
0
43
0
RT @HaochengXiUCB: 🚀 COAT: Memory Efficient FP8 Training @ICLR 2025 .📍 Hall 3 + Hall 2B Poster #566 .🗓 Sat, Apr 26 | 3:00–5:30 PM Singapore….
0
16
0
RT @qingqing_zhao_: Introduce CoT-VLA – Visual Chain-of-Thought reasoning for Robot Foundation Models! 🤖. By leveraging next-frame predicti….
0
55
0
RT @jxwuyi: 🎉 Milestone Release! AReaL-boba, our latest #RL system! #AI.• data/code/model ALL🔥 #OPENSOURCE.• Full #….
0
41
0
RT @GoogleDeepMind: Meet Gemini Robotics: our latest AI models designed for a new generation of helpful robots. 🤖. Based on Gemini 2.0, the….
0
466
0
RT @yukangchen_: 🚀 LongVILA is open-sourced: our comprehensive solution for scaling long-context Visual-Language Models (VLMs) to tackle th….
0
19
0
RT @physical_int: At Physical Intelligence (π) our mission is to bring general-purpose AI into the physical world. We're excited to show….
0
326
0
RT @zhuoyang_zhang: 🥳 We're thrilled to announce that VILA-U is now open source! . VILA-U is a Unified foundation model that integrates Vi….
0
52
0
RT @hancai_hm: 🥳 We are excited to introduce Deep Compression Autoencoder. It dramatically reduces the token number of the latent space, de….
0
7
0
RT @xieenze_jr: 🥳 🚀 We are excited to introduce SANA⚡, an efficient linear DiT that can generate images up to 4096 × 4096🌆🎨. SANA delivers:….
0
18
0
RT @haotiant1998: 🚀 We're thrilled to introduce HART, an efficient AR model that generates stunning 1024x1024 images! 🎨✨. HART delivers:.⚡️….
0
26
0
RT @LigengZhu: VLMs can improve themselves! 🚀🚀🚀.Introducing VILA^2: VILA Augmented VILA, our new technology to achieve SOTA by augmenting p….
0
21
0
VILA1.5 is released! Fully open sourced(w/ training code and training data)! Superior image and video understanding capability. Strongest oss video captioning model. Also has a small variant at 3B highly optimized for edge/realtime applications.
We release VILA-1.5, an efficient visual language model (VLM) that can understand not only images but also videos. VILA-1.5 achieves state-of-the-art accuracy among open source VLMs on the MMMU dataset. CVPR'24 paper: Code:
0
6
37