Xiuye Gu
@laoreja001
Followers
371
Following
72
Media
0
Statuses
21
@ Google DeepMind | Stanford alumni
Joined April 2021
Our team at Google DeepMind is seeking a Research Scientist with a strong publication record (multiple first-author papers) on multi-modal LLMs in top ML venues like NeurIPS, ICLR, CVPR. Email me at af_hiring@google.com @CordeliaSchmid
4
47
380
Our VideoPoet paper won the best paper award at ICML 2024! Huge thanks to the VFFM team! Sadly I wasn’t able to attend this ICML 🥹
Congratulations to the authors of "VideoPoet: A Large Language Model for Zero-Shot Video Generation" for winning one of this year's @icmlconf Best Paper Awards! #ICML2024 Paper: https://t.co/JinpikSveV Blog post: https://t.co/jdqehGqWW6
3
1
78
Some of our previous works in this field include VideoPoet ( https://t.co/WCWeiMvWZm), WALT ( https://t.co/li5sKSTk6b), and Language Model Beats Diffusion: Tokenizer is key to visual generation ( https://t.co/wuMOEEymN6)
lnkd.in
This link will take you to a page that’s not on LinkedIn
0
3
12
Introducing VideoPoet, a large language model for zero-shot video generation that produces a range of large & smooth motions while preserving objects’ appearance over multiple seconds. Learn more and check out a range of example generated videos → https://t.co/jdqehGruLE
55
280
975
Our team at Google Research is hiring a research intern working on video generation, please email xiuyegu@google.com if you are interested.
5
8
101
Checkout our work on adding the visual localization ability to language models!
Checkout our project page: https://t.co/5yDui0LUKW arxiv: https://t.co/IGVZo758Fd This work is done when I am interning with @zhouxy2017 at @Google Research in Seattle, with collaborators @shenyyann, @laoreja001, @anuragarnab, @jesu9, @xiaolonw, @CordeliaSchmid (5/n
0
1
13
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model DaTaSeg improves performance on all datasets, especially small-scale datasets, achieving 54.0 mIoU on ADE semantic and 53.5 PQ on COCO panoptic. https://t.co/qlUbApuQQc
0
7
31
Check our work on a training-free method for open vocabulary segmentation.
Discover our training-free model for open vocabulary image segmentation! Efficient with just 3.6G memory, it segments countless visual concepts with ease. Surpasses models fine-tuned on millions of samples. Zero training, maximum performance!🚀 paper: https://t.co/iKdRsjmKkp
0
0
2
We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇
49
248
1K
We are super excited to announce Dr. @AndrewYNg will give a keynote talk at our 2nd CVinW Workshop! Dr. Ng is the pioneer of deep learning, founder of Coursera&LandingAI. His recent focus on data-centric AI for vision is perfectly aligned with our workshop! Join us on June 19th!
We announce the 2nd Computer Vision in the Wild (CVinW) Workshop @CVPR to further promote the research on open-world vision that can easily adapt to new concepts&domains! A great lineup of experts will discuss challenges and solutions from their angles! ➡️ https://t.co/Wcr1EMHsnf
5
18
133
Our 2nd Workshop on Computer Vision in the Wild is happening tomorrow (Jun 19) 8:45 am - 5:30 pm, PT in East Ballroom B. Featuring 1 keynote, 7 invited talks, 2 challenges, and a panel discussion! See you tomorrow! https://t.co/kZPl9Cnrqv.
#CVPR2023 @CVPR
computer-vision-in-the-wild.github.io
CVPR 2023
2
8
38
Learn how REVEAL, an end-to-end retrieval-augmented visual-language model that learns to use multi-source multi-modal data to answer knowledge-intensive queries, achieves state-of-the-art results on visual question answering and image caption tasks. https://t.co/NXfVeLSD2e
14
84
271
For the expected graduation date, it’s not a hard requirement. We try to prioritize candidates who are in the final year of their degree program and would be eligible for conversion opportunities after an internship (graduating by December 2024).
0
0
0
The research intern is generally intended for students who are expected to graduate within 1 year. You will work with great collaborators in Google Research and do cutting edge research & publish in top-tier venues like CVPR, ICCV & NeurIPS.
1
0
5
(1/N) **HIRING ALERT** Our team at @GoogleAI, led by @CordeliaSchmid, is hiring a full time Research Scientist, as well as PhD interns, to be based in Grenoble. The mission of our team is to learn high-level visual representations for video understanding ...
7
68
473
Can we directly build upon a frozen vision and language model (VLM) to detect objects described by texts? Yes! Our open-vocabulary detector F-VLM trains simpler than closed-vocabulary counterparts, and achieves SoTA performance on LVIS. https://t.co/i7u7H1UjzX
1
1
7
Our OpenSeg paper is accepted to #ECCV2022! We updated the camera ready version on arXiv, including new results and reflecting recent concurrent works. We plan to release our code and model. Feel free to contact us if you have any questions!
Open-Vocabulary Image Segmentation abs: https://t.co/LX6LmrkQzl OpenSeg outperforms baselines by 3.4 mIoU on PASCAL-Context (459 classes) and 2.7 mIoU on ADE-20k (847 classes)
7
11
108
Our work on open-vocabulary detection is accepted by ICLR 2022! with Xiuye, Weicheng, and @YinCuiCV Have fun with our demo:
colab.research.google.com
Run, share, and edit Python notebooks
Can we use free-form text to detect any object, especially long-tailed objects? Yes! We train Mask R-CNN by distilling from CLIP to enable zero-shot detection. The model achieves higher AP compared to its supervised counterpart on rare classes. https://t.co/ZAE7UtLcv5
2
30
162