Jianwei Yang Profile
Jianwei Yang

@jw2yang4ai

Followers
4K
Following
2K
Media
76
Statuses
441

RS @Meta SuperIntelligence Lab; ex-MSR; Core contributor of Project Florence, Phi-3V, Omniparser; (Co-)Inventor of FocalNet, SEEM, SoM, DeepStack and Magma.

Redmond, WA
Joined July 2016
Don't wanna be here? Send us removal request.
@jw2yang4ai
Jianwei Yang
5 months
Life Update: Now that I have finished the presentation of last @MSFTResearch project Magma at @CVPR, I am excited to share that I have joined @AIatMeta as a research scientist to further push forward the boundary of multimodal foundation models! I have always been passionate
52
6
384
@oier_mees
Oier Mees
14 days
If you have been impacted by today's layoffs at Meta's AI teams, please know that Microsoft Zurich is hiring for positions in multimodal foundation models & robot learning. You and your teams have given so much to the AI community, I hope we can all give back and support you now
0
3
100
@jw2yang4ai
Jianwei Yang
1 month
🚀Excited to see Qwen3-VL released as the new SOTA open-source vision-language model! What makes it extra special is that it’s powered by DeepStack, a technique I co-developed with Lingchen, who is now a core contributor of Qwen3-VL. When Lingchen and I developed this technique
@Alibaba_Qwen
Qwen
1 month
🚀 We're thrilled to unveil Qwen3-VL — the most powerful vision-language model in the Qwen series yet! 🔥 The flagship model Qwen3-VL-235B-A22B is now open-sourced and available in both Instruct and Thinking versions: ✅ Instruct outperforms Gemini 2.5 Pro on key vision
4
21
284
@jw2yang4ai
Jianwei Yang
2 months
Great work! Building a coherent representation for the complicated world - visual and semantic, 2D and 3D, spatial and temporal, is challenging but critical. Having a single tokenizer for all is definitely a great step stone to next generation of multimodal models!
@jiasenlu
Jiasen Lu
2 months
Vision tokenizers are stuck in 2020🤔while language models revolutionized AI🚀 Language: One tokenizer for everything Vision: Fragmented across modalities & tasks Introducing AToken: The first unified visual tokenizer for images, videos & 3D that does BOTH reconstruction AND
0
1
9
@XDWang101
XuDong Wang
2 months
🎉 Excited to share RecA: Reconstruction Alignment Improves Unified Multimodal Models 🔥 Post-train w/ RecA: 8k images & 4 hours (8 GPUs) → SOTA UMMs: GenEval 0.73→0.90 | DPGBench 80.93→88.15 | ImgEdit 3.38→3.75 Code: https://t.co/yFEvJ0Algw 1/n
6
30
80
@jw2yang4ai
Jianwei Yang
4 months
VLM struggles badly to interpret 3D from 2D observations, but what if it has a good mental model about the world? Checkout our MindJourney - A test-time scaling for spatial reasoning in 3D world. Without any specific training, MindJourney imagines (acts mentally) step-by-step
@YuncongYY
Yuncong Yang
4 months
Test-time scaling nailed code & math—next stop: the real 3D world. 🌍 MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering. One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀 🧵1/
0
3
31
@du_yilun
Yilun Du
4 months
VLMs often struggle with physical reasoning tasks such as spatial reasoning. Excited to share how we can use world models + test-time search to zero-shot improve spatial reasoning in VLMs!
@_akhaliq
AK
4 months
MindJourney Test-Time Scaling with World Models for Spatial Reasoning
3
24
189
@jw2yang4ai
Jianwei Yang
5 months
Wow, this is so cool! Have been dreaming of building agents that can interact with humans via language communications, and the world via physical interaction (locomotion, manipulation, etc). Definitely a great step-stone and playground!
@gan_chuang
Chuang Gan
5 months
World Simulator, reimagined — now alive with humans, robots, and their vibrant society unfolding in 3D real-world geospatial scenes across the globe! 🚀 One day soon, humans and robots will co-exist in the same world. To prepare, we must address: 1️⃣ How can robots cooperate or
0
3
12
@jw2yang4ai
Jianwei Yang
5 months
0
0
0
@jw2yang4ai
Jianwei Yang
5 months
📢 Join us tomorrow morning at our CVPR 2025 poster session (#340, ExHall D, 10:30am–12:30pm) to chat about Project Magma 👉 https://t.co/Jtx3RaQhcs This is a big team effort to build a multimodal agentic model capable of understanding and acting in both digital and physical
4
16
80
@jiasenlu
Jiasen Lu
5 months
check our poster at 240 on exhibition hall D at 10:30 today!
@jiasenlu
Jiasen Lu
11 months
(1/10) 🔥Thrilled to introduce OneDiffusion—our latest work in unified diffusion modeling! 🚀 This model bridges the gap between image synthesis and understanding, excelling in a wide range of tasks: T2I, conditional generation, image understanding, identity preservation,
0
2
18
@jw2yang4ai
Jianwei Yang
5 months
Our afternoon session is about to start very soon with Prof. @RanjayKrishna at Room 101B!
@jw2yang4ai
Jianwei Yang
5 months
🔥@CVPR2025 CVinW 2025 is about to take place very soon!! We have a plenty of great talks and spotlight talks upcoming (@BoqingGo, @RanjayKrishna @furongh @YunzhuLiYZ @sainingxie @CordeliaSchmid, Shizhe Chen). Look forward to seeing you all at 101B from 9am-5pm, June 11th!
0
1
10
@jw2yang4ai
Jianwei Yang
5 months
🔥@CVPR2025 CVinW 2025 is about to take place very soon!! We have a plenty of great talks and spotlight talks upcoming (@BoqingGo, @RanjayKrishna @furongh @YunzhuLiYZ @sainingxie @CordeliaSchmid, Shizhe Chen). Look forward to seeing you all at 101B from 9am-5pm, June 11th!
@jw2yang4ai
Jianwei Yang
6 months
🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 https://t.co/Z5r48oh6iv ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.
0
9
39
@furongh
Furong Huang
5 months
Excited to speak at the Workshop on Computer Vision in the Wild @CVPR 2025! 🎥🌍 🗓️ June 11 | 📍 Room 101 B, Music City Center, Nashville, TN 🎸 🧠 Talk: From Perception to Action: Building World Models for Generalist Agents Let’s connect if you're around! #CVPR2025 #robotics
2
17
65
@Cohere_Labs
Cohere Labs
6 months
Our community-led Computer Vision group is thrilled to host @jw2yang4ai, Principal Researcher at Microsoft Research for a session on "Magma: A Foundation Model for Multimodal AI Agents" Thanks to @cataluna84 and @Arkhymadhe for organizing this speaker session 👏
2
2
24
@jw2yang4ai
Jianwei Yang
6 months
Hope you all had a great #NeurIPS2025 submissions and have a good rest! We are still open to submissions to our CVinW workshop at @CVPR! Welcome to share your work on our workshop with a few clicks! 👉Submit Portal:
openreview.net
Welcome to the OpenReview homepage for CVPR 2025 Workshop CVinW
@jw2yang4ai
Jianwei Yang
6 months
🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 https://t.co/Z5r48oh6iv ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.
0
4
54
@RichardSSutton
Richard Sutton
6 months
The latest episode of the Derby Mill Podcast is just out and focused on the "Era of Experience" paper by David Silver and myself. Substack:  https://t.co/7jw3PcX2bL Spotify:  https://t.co/2X4jsKKKAa Apple:  https://t.co/RwdGlENz3f YouTube:  https://t.co/fJxrnFYFg6
1
46
206
@AhmedHAwadallah
Ahmed Awadallah
6 months
Introducing Phi-4-reasoning, adding reasoning models to the Phi family of SLMs. The model is trained with both supervised finetuning (using a carefully curated dataset of reasoning demonstration) and Reinforcement Learning. 📌Competitive results on reasoning benchmarks with
4
34
140
@ypwang61
Yiping Wang
6 months
We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: https://t.co/D65XR9mMs2
14
89
424
@jw2yang4ai
Jianwei Yang
6 months
We have a large group of organizers for our workshop and challenges! 🙌 Feel free to reach out to any of them to get more information about our workshop! ✉️ Kudos to the whole team! 🎉
0
0
4