Weidi Xie
@WeidiXie
Followers
3K
Following
979
Media
119
Statuses
542
Computer Vision Researcher. Associate Professor at SJTU, Previously @Oxford_VGG. 中文名:谢伟迪 Personal Webpage: https://t.co/sZoZ0AfKrX
Oxford, England
Joined May 2018
Animated movies can be effortlessly understood by young minds, but appear to be challenging for video-language models, why? The key problem is the huge diversity of animated characters -- their appearance ranges from human-like faces, to cars, fish, blobs, etc.
1
3
12
If you are at ACM MM, don't miss it. ⚽️⚽️⚽️
“Multi-agent System for Comprehensive Soccer Understanding” done with @WeidiXie will be oral presented #ACMMM2025 in Dublin at GoldSmith3, RDCC from around 14:20 tommorrow (10.30). Come and chat if you are interested!⚽️🏆Also, join our challenge on CVPR 2026 based on this work!
0
0
4
Join our challenge @CVPR 2026! ⚽️⚽️🏆🏆 For $1,000 Prize with Soccer VQA task!
🚀🌟We're excited to kick off the third official task of #SoccerNet Challenges 2026! ⚽️3⃣ VQA Visual Question Answering Details of this task are available at https://t.co/rEUmLAQKZQ
0
2
4
🚀 Excited to share that RadGenome-Chest CT is now published in @ScientificData! It is a comprehensive, large-scale, and fine-grained visual-language dataset based 3D CT-RATE, including: - Organ-level segmentation for 197 categories; - 665K multi-granularity grounded reports;
0
1
4
Some updates on point tracking, we present you: Track-On2, with - a unified memory module, - multi-scale features, and - training on longer videos. Despite being online and training only on synthetic data, Track-On2 achieves SoTA on multiple benchmarks🚀 perfect for real-time
0
0
8
Again, after almost two years, SAT is finally published @npjDigitalMed Thanks for all the work from reviewers and editors. 🥳 Please check the papers with changes on title and more experiments ! Large-vocabulary segmentation for medical images with text prompts.
We have made the first release of SAT-Nano, with model and inference code. The SAT-Ultra will be released soon as well. Stay tuned ! Webpage: https://t.co/0U5kE6h4t4 Code & Model: https://t.co/JuauABuc8Y
1
4
27
The work towards building large-vocabulary text-prompted segmentation models for radiology, is now online at Npj Digital Medicine.
Recent advancements in medical image analysis have been marked by AI models trained for specific tasks. But one drawback is their narrow specialization. https://t.co/s6bTktktb5
0
1
2
🚀 Excited to introduce our new work, Deep-DxSearch End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning ( https://t.co/hHKxBl3RLT) 🧐Objective: the goal is to transform human-prior guided LLM-powered medical RAG into end-to-end data-driven optimized agentic
0
0
10
After almost two years, RadFM is finally published @NatureComms Thanks for all the work from reviewers and editors. 🥳🥳 https://t.co/9PtuzmWPPK Please check the papers with changes on title: Towards generalist foundation model for radiology by leveraging web-scale 2D&3D
We have updated the manuscript : - more recent models as baselines, - better model performance, - more comprehensive evaluation, including both machine and human scoring. Arxiv: https://t.co/KiEw4DI579 Website: https://t.co/pCFUbnDmDr
2
5
56
🚀 Excited to announce that StreamFormer is accepted to #ICCV2025 as an Oral paper! Congrats to my excellent coauthors @Anxiou51 @WeidiXie "Learning Streaming Video Representation via Multitask Training"! 📄Paper: https://t.co/h8OUpPaIVq 🌐Web: https://t.co/z2geDGRNXr
2
2
4
Oral@#ICCV2025 ! Only a starting point for streaming video representation learning, stay tuned, a lot more interesting work to come !
🎉Oral@#ICCV2025! We present "Learning Streaming Video Representation via Multitask Training"! 🚀TL;DR: Our StreamFormer enables efficient streaming video representation learning through multitask training. 🌐Project Page: https://t.co/JtOMAKrqpf 📄Paper: https://t.co/Yp5zbjjX0x
0
2
47
Thanks for sharing !😊
An Agentic System for Rare Disease Diagnosis Great example of the power of specialized complex agentic systems. Diagnostic performance: Among 2,919 diseases, it achieves 100% accuracy for 1013 diseases. Uses 40 specialized tools and data sources via MCP. Here are my notes:
0
1
2
We’re hosting the 1st SLoMO Workshop at #ICCV2025 to discuss Story-Level Movie Understanding & Audio Descriptions !
Movies are more than just video clips, they are stories! 🎬 We’re hosting the 1st SLoMO Workshop at #ICCV2025 to discuss Story-Level Movie Understanding & Audio Descriptions! Website: https://t.co/k1hDRCFjjd Competition: https://t.co/JseLilr6oc
0
2
11
Fun work on AI4Soccer ! ⚽️⚽️⚽️⚽️
In ACM MM 2025, we introduced "Multi-Agent System for Comprehensive Soccer Understanding" with @WeidiXie ⚽️AI with external knowledge to analyze on/off-field dynamics! 🧐 📄: https://t.co/SWY6AIgMGe 🌐: https://t.co/upZTE3mjKB See details below! #AI4Sports #MultiAgent #ACMMM25
0
1
5
For the soccer fans, don't miss this !!!!⚽️⚽️⚽️⚽️⚽️
If you are a soccer fan in #CVPR25, dont' miss our work of "⚽️Towards Universal Soccer Video Understanding🥇" with @WeidiXie and @HaoningWu_!!! Poster session 2, June 13 from 4-6 p.m., ExHall D, No.288. 😆😆⚽️⚽️👏👏
0
3
10