Jianwei Yang
@jw2yang4ai
Followers
4K
Following
2K
Media
76
Statuses
441
RS @Meta SuperIntelligence Lab; ex-MSR; Core contributor of Project Florence, Phi-3V, Omniparser; (Co-)Inventor of FocalNet, SEEM, SoM, DeepStack and Magma.
Redmond, WA
Joined July 2016
Life Update: Now that I have finished the presentation of last @MSFTResearch project Magma at @CVPR, I am excited to share that I have joined @AIatMeta as a research scientist to further push forward the boundary of multimodal foundation models! I have always been passionate
52
6
384
If you have been impacted by today's layoffs at Meta's AI teams, please know that Microsoft Zurich is hiring for positions in multimodal foundation models & robot learning. You and your teams have given so much to the AI community, I hope we can all give back and support you now
0
3
100
🚀Excited to see Qwen3-VL released as the new SOTA open-source vision-language model! What makes it extra special is that it’s powered by DeepStack, a technique I co-developed with Lingchen, who is now a core contributor of Qwen3-VL. When Lingchen and I developed this technique
🚀 We're thrilled to unveil Qwen3-VL — the most powerful vision-language model in the Qwen series yet! 🔥 The flagship model Qwen3-VL-235B-A22B is now open-sourced and available in both Instruct and Thinking versions: ✅ Instruct outperforms Gemini 2.5 Pro on key vision
4
21
284
Great work! Building a coherent representation for the complicated world - visual and semantic, 2D and 3D, spatial and temporal, is challenging but critical. Having a single tokenizer for all is definitely a great step stone to next generation of multimodal models!
Vision tokenizers are stuck in 2020🤔while language models revolutionized AI🚀 Language: One tokenizer for everything Vision: Fragmented across modalities & tasks Introducing AToken: The first unified visual tokenizer for images, videos & 3D that does BOTH reconstruction AND
0
1
9
🎉 Excited to share RecA: Reconstruction Alignment Improves Unified Multimodal Models 🔥 Post-train w/ RecA: 8k images & 4 hours (8 GPUs) → SOTA UMMs: GenEval 0.73→0.90 | DPGBench 80.93→88.15 | ImgEdit 3.38→3.75 Code: https://t.co/yFEvJ0Algw 1/n
6
30
80
VLM struggles badly to interpret 3D from 2D observations, but what if it has a good mental model about the world? Checkout our MindJourney - A test-time scaling for spatial reasoning in 3D world. Without any specific training, MindJourney imagines (acts mentally) step-by-step
Test-time scaling nailed code & math—next stop: the real 3D world. 🌍 MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering. One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀 🧵1/
0
3
31
VLMs often struggle with physical reasoning tasks such as spatial reasoning. Excited to share how we can use world models + test-time search to zero-shot improve spatial reasoning in VLMs!
3
24
189
Wow, this is so cool! Have been dreaming of building agents that can interact with humans via language communications, and the world via physical interaction (locomotion, manipulation, etc). Definitely a great step-stone and playground!
World Simulator, reimagined — now alive with humans, robots, and their vibrant society unfolding in 3D real-world geospatial scenes across the globe! 🚀 One day soon, humans and robots will co-exist in the same world. To prepare, we must address: 1️⃣ How can robots cooperate or
0
3
12
📢 Join us tomorrow morning at our CVPR 2025 poster session (#340, ExHall D, 10:30am–12:30pm) to chat about Project Magma 👉 https://t.co/Jtx3RaQhcs This is a big team effort to build a multimodal agentic model capable of understanding and acting in both digital and physical
4
16
80
check our poster at 240 on exhibition hall D at 10:30 today!
(1/10) 🔥Thrilled to introduce OneDiffusion—our latest work in unified diffusion modeling! 🚀 This model bridges the gap between image synthesis and understanding, excelling in a wide range of tasks: T2I, conditional generation, image understanding, identity preservation,
0
2
18
Our afternoon session is about to start very soon with Prof. @RanjayKrishna at Room 101B!
🔥@CVPR2025 CVinW 2025 is about to take place very soon!! We have a plenty of great talks and spotlight talks upcoming (@BoqingGo, @RanjayKrishna @furongh @YunzhuLiYZ @sainingxie @CordeliaSchmid, Shizhe Chen). Look forward to seeing you all at 101B from 9am-5pm, June 11th!
0
1
10
🔥@CVPR2025 CVinW 2025 is about to take place very soon!! We have a plenty of great talks and spotlight talks upcoming (@BoqingGo, @RanjayKrishna @furongh @YunzhuLiYZ @sainingxie @CordeliaSchmid, Shizhe Chen). Look forward to seeing you all at 101B from 9am-5pm, June 11th!
🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 https://t.co/Z5r48oh6iv ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.
0
9
39
Our community-led Computer Vision group is thrilled to host @jw2yang4ai, Principal Researcher at Microsoft Research for a session on "Magma: A Foundation Model for Multimodal AI Agents" Thanks to @cataluna84 and @Arkhymadhe for organizing this speaker session 👏
2
2
24
Hope you all had a great #NeurIPS2025 submissions and have a good rest! We are still open to submissions to our CVinW workshop at @CVPR! Welcome to share your work on our workshop with a few clicks! 👉Submit Portal:
openreview.net
Welcome to the OpenReview homepage for CVPR 2025 Workshop CVinW
🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 https://t.co/Z5r48oh6iv ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.
0
4
54
The latest episode of the Derby Mill Podcast is just out and focused on the "Era of Experience" paper by David Silver and myself. Substack: https://t.co/7jw3PcX2bL Spotify: https://t.co/2X4jsKKKAa Apple: https://t.co/RwdGlENz3f YouTube: https://t.co/fJxrnFYFg6
1
46
206
Introducing Phi-4-reasoning, adding reasoning models to the Phi family of SLMs. The model is trained with both supervised finetuning (using a carefully curated dataset of reasoning demonstration) and Reinforcement Learning. 📌Competitive results on reasoning benchmarks with
4
34
140
We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: https://t.co/D65XR9mMs2
14
89
424
We have a large group of organizers for our workshop and challenges! 🙌 Feel free to reach out to any of them to get more information about our workshop! ✉️ Kudos to the whole team! 🎉
0
0
4