laoreja001 Profile Banner
Xiuye Gu Profile
Xiuye Gu

@laoreja001

Followers
371
Following
72
Media
0
Statuses
21

@ Google DeepMind | Stanford alumni

Joined April 2021
Don't wanna be here? Send us removal request.
@KaiwenZha
Kaiwen Zha
9 months
Excited to present our #CVPR2025 oral paper #TexTok @CVPR! #TexTok is an image tokenizer that uses text during tokenization, achieving both high recon/gen quality and low cost. 🎙️Oral: Sat, June 14, 1:15-1:30PM CDT (Session 4A) 📌Poster: Sat, June 14, 5-7PM CDT (ExHall D #252)
1
3
8
@alirezafathi
Alireza Fathi
1 year
Our team at Google DeepMind is seeking a Research Scientist with a strong publication record (multiple first-author papers) on multi-modal LLMs in top ML venues like NeurIPS, ICLR, CVPR. Email me at af_hiring@google.com @CordeliaSchmid
4
47
380
@laoreja001
Xiuye Gu
2 years
Our VideoPoet paper won the best paper award at ICML 2024! Huge thanks to the VFFM team! Sadly I wasn’t able to attend this ICML 🥹
@GoogleAI
Google AI
2 years
Congratulations to the authors of "VideoPoet: A Large Language Model for Zero-Shot Video Generation" for winning one of this year's @icmlconf Best Paper Awards! #ICML2024 Paper: https://t.co/JinpikSveV Blog post: https://t.co/jdqehGqWW6
3
1
78
@laoreja001
Xiuye Gu
2 years
Some of our previous works in this field include VideoPoet ( https://t.co/WCWeiMvWZm), WALT ( https://t.co/li5sKSTk6b), and Language Model Beats Diffusion: Tokenizer is key to visual generation ( https://t.co/wuMOEEymN6)
lnkd.in
This link will take you to a page that’s not on LinkedIn
0
3
12
@GoogleAI
Google AI
2 years
Introducing VideoPoet, a large language model for zero-shot video generation that produces a range of large & smooth motions while preserving objects’ appearance over multiple seconds. Learn more and check out a range of example generated videos → https://t.co/jdqehGruLE
55
280
975
@laoreja001
Xiuye Gu
2 years
Our team at Google Research is hiring a research intern working on video generation, please email xiuyegu@google.com if you are interested.
5
8
101
@laoreja001
Xiuye Gu
2 years
Checkout our work on adding the visual localization ability to language models!
@Jerry_XU_Jiarui
Jiarui Xu
2 years
Checkout our project page: https://t.co/5yDui0LUKW arxiv: https://t.co/IGVZo758Fd This work is done when I am interning with @zhouxy2017 at @Google Research in Seattle, with collaborators @shenyyann, @laoreja001, @anuragarnab, @jesu9, @xiaolonw, @CordeliaSchmid (5/n
0
1
13
@arankomatsuzaki
Aran Komatsuzaki
3 years
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model DaTaSeg improves performance on all datasets, especially small-scale datasets, achieving 54.0 mIoU on ADE semantic and 53.5 PQ on COCO panoptic. https://t.co/qlUbApuQQc
0
7
31
@laoreja001
Xiuye Gu
2 years
Check our work on a training-free method for open vocabulary segmentation.
@Kevin_SSY
Shuyang (Kevin) Sun
2 years
Discover our training-free model for open vocabulary image segmentation! Efficient with just 3.6G memory, it segments countless visual concepts with ease. Surpasses models fine-tuned on millions of samples. Zero training, maximum performance!🚀 paper: https://t.co/iKdRsjmKkp
0
0
2
@agrimgupta92
Agrim Gupta
2 years
We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇
49
248
1K
@jw2yang4ai
Jianwei Yang
3 years
We are super excited to announce Dr. @AndrewYNg will give a keynote talk at our 2nd CVinW Workshop! Dr. Ng is the pioneer of deep learning, founder of Coursera&LandingAI. His recent focus on data-centric AI for vision is perfectly aligned with our workshop! Join us on June 19th!
@jw2yang4ai
Jianwei Yang
3 years
We announce the 2nd Computer Vision in the Wild (CVinW) Workshop @CVPR to further promote the research on open-world vision that can easily adapt to new concepts&domains! A great lineup of experts will discuss challenges and solutions from their angles! ➡️ https://t.co/Wcr1EMHsnf
5
18
133
@laoreja001
Xiuye Gu
3 years
Our 2nd Workshop on Computer Vision in the Wild is happening tomorrow (Jun 19) 8:45 am - 5:30 pm, PT in East Ballroom B. Featuring 1 keynote, 7 invited talks, 2 challenges, and a panel discussion! See you tomorrow! https://t.co/kZPl9Cnrqv. #CVPR2023 @CVPR
Tweet card summary image
computer-vision-in-the-wild.github.io
CVPR 2023
2
8
38
@GoogleAI
Google AI
3 years
Learn how REVEAL, an end-to-end retrieval-augmented visual-language model that learns to use multi-source multi-modal data to answer knowledge-intensive queries, achieves state-of-the-art results on visual question answering and image caption tasks. https://t.co/NXfVeLSD2e
14
84
271
@laoreja001
Xiuye Gu
3 years
For the expected graduation date, it’s not a hard requirement. We try to prioritize candidates who are in the final year of their degree program and would be eligible for conversion opportunities after an internship (graduating by December 2024).
0
0
0
@laoreja001
Xiuye Gu
3 years
The research intern is generally intended for students who are expected to graduate within 1 year. You will work with great collaborators in Google Research and do cutting edge research & publish in top-tier venues like CVPR, ICCV & NeurIPS.
1
0
5
@laoreja001
Xiuye Gu
3 years
Looking for a research intern! The topic will be video segmentation with lots of flexibility, e.g., open-vocabulary, weakly-supervised learning, etc. If you are interested, please contact us at xiuyegu@google.com and siyang@google.com and apply via
4
4
25
@NagraniArsha
Arsha Nagrani
3 years
(1/N) **HIRING ALERT** Our team at @GoogleAI, led by @CordeliaSchmid, is hiring a full time Research Scientist, as well as PhD interns, to be based in Grenoble. The mission of our team is to learn high-level visual representations for video understanding ...
7
68
473
@weichengkuo
Wei-Cheng Kuo
3 years
Can we directly build upon a frozen vision and language model (VLM) to detect objects described by texts? Yes! Our open-vocabulary detector F-VLM trains simpler than closed-vocabulary counterparts, and achieves SoTA performance on LVIS. https://t.co/i7u7H1UjzX
1
1
7
@YinCuiCV
Yin Cui
4 years
Our OpenSeg paper is accepted to #ECCV2022! We updated the camera ready version on arXiv, including new results and reflecting recent concurrent works. We plan to release our code and model. Feel free to contact us if you have any questions!
@_akhaliq
AK
4 years
Open-Vocabulary Image Segmentation abs: https://t.co/LX6LmrkQzl OpenSeg outperforms baselines by 3.4 mIoU on PASCAL-Context (459 classes) and 2.7 mIoU on ADE-20k (847 classes)
7
11
108
@TsungYiLinCV
Tsung-Yi Lin
4 years
Our work on open-vocabulary detection is accepted by ICLR 2022! with Xiuye, Weicheng, and @YinCuiCV Have fun with our demo:
Tweet card summary image
colab.research.google.com
Run, share, and edit Python notebooks
@YinCuiCV
Yin Cui
5 years
Can we use free-form text to detect any object, especially long-tailed objects? Yes! We train Mask R-CNN by distilling from CLIP to enable zero-shot detection. The model achieves higher AP compared to its supervised counterpart on rare classes. https://t.co/ZAE7UtLcv5
2
30
162