Xiuye Gu @laoreja001 X Profile

Xiuye Gu

@laoreja001

Followers

371

Following

72

Media

0

Statuses

21

@ Google DeepMind | Stanford alumni

https://t.co/UEUDltlLFq

Joined April 2021

Don't wanna be here? Send us removal request.

Kaiwen Zha

@KaiwenZha

9 months

Excited to present our #CVPR2025 oral paper #TexTok @CVPR! #TexTok is an image tokenizer that uses text during tokenization, achieving both high recon/gen quality and low cost. 🎙️Oral: Sat, June 14, 1:15-1:30PM CDT (Session 4A) 📌Poster: Sat, June 14, 5-7PM CDT (ExHall D #252)

1

3

8

Alireza Fathi

@alirezafathi

1 year

Our team at Google DeepMind is seeking a Research Scientist with a strong publication record (multiple first-author papers) on multi-modal LLMs in top ML venues like NeurIPS, ICLR, CVPR. Email me at af_hiring@google.com @CordeliaSchmid

4

47

380

Xiuye Gu

@laoreja001

2 years

Our VideoPoet paper won the best paper award at ICML 2024! Huge thanks to the VFFM team! Sadly I wasn’t able to attend this ICML 🥹

Google AI

@GoogleAI

2 years

Congratulations to the authors of "VideoPoet: A Large Language Model for Zero-Shot Video Generation" for winning one of this year's @icmlconf Best Paper Awards! #ICML2024 Paper: https://t.co/JinpikSveV Blog post: https://t.co/jdqehGqWW6

3

1

78

Xiuye Gu

@laoreja001

2 years

Some of our previous works in this field include VideoPoet ( https://t.co/WCWeiMvWZm), WALT ( https://t.co/li5sKSTk6b), and Language Model Beats Diffusion: Tokenizer is key to visual generation ( https://t.co/wuMOEEymN6)

lnkd.in

This link will take you to a page that’s not on LinkedIn

0

3

12

Google AI

@GoogleAI

2 years

Introducing VideoPoet, a large language model for zero-shot video generation that produces a range of large & smooth motions while preserving objects’ appearance over multiple seconds. Learn more and check out a range of example generated videos → https://t.co/jdqehGruLE

55

280

975

Xiuye Gu

@laoreja001

2 years

Our team at Google Research is hiring a research intern working on video generation, please email xiuyegu@google.com if you are interested.

5

8

101

Xiuye Gu

@laoreja001

2 years

Checkout our work on adding the visual localization ability to language models!

Jiarui Xu

@Jerry_XU_Jiarui

2 years

Checkout our project page: https://t.co/5yDui0LUKW arxiv: https://t.co/IGVZo758Fd This work is done when I am interning with @zhouxy2017 at @Google Research in Seattle, with collaborators @shenyyann, @laoreja001, @anuragarnab, @jesu9, @xiaolonw, @CordeliaSchmid (5/n

0

1

13

Aran Komatsuzaki

@arankomatsuzaki

3 years

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model DaTaSeg improves performance on all datasets, especially small-scale datasets, achieving 54.0 mIoU on ADE semantic and 53.5 PQ on COCO panoptic. https://t.co/qlUbApuQQc

0

7

31

Xiuye Gu

@laoreja001

2 years

Check our work on a training-free method for open vocabulary segmentation.

Shuyang (Kevin) Sun

@Kevin_SSY

2 years

Discover our training-free model for open vocabulary image segmentation! Efficient with just 3.6G memory, it segments countless visual concepts with ease. Surpasses models fine-tuned on millions of samples. Zero training, maximum performance!🚀 paper: https://t.co/iKdRsjmKkp

0

2

Agrim Gupta

@agrimgupta92

2 years

We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇

49

248

1K

Jianwei Yang

@jw2yang4ai

3 years

We are super excited to announce Dr. @AndrewYNg will give a keynote talk at our 2nd CVinW Workshop! Dr. Ng is the pioneer of deep learning, founder of Coursera&LandingAI. His recent focus on data-centric AI for vision is perfectly aligned with our workshop! Join us on June 19th!

Jianwei Yang

@jw2yang4ai

3 years

We announce the 2nd Computer Vision in the Wild (CVinW) Workshop @CVPR to further promote the research on open-world vision that can easily adapt to new concepts&domains! A great lineup of experts will discuss challenges and solutions from their angles! ➡️ https://t.co/Wcr1EMHsnf

5

18

133

Xiuye Gu

@laoreja001

3 years

Our 2nd Workshop on Computer Vision in the Wild is happening tomorrow (Jun 19) 8:45 am - 5:30 pm, PT in East Ballroom B. Featuring 1 keynote, 7 invited talks, 2 challenges, and a panel discussion! See you tomorrow! https://t.co/kZPl9Cnrqv. #CVPR2023 @CVPR

computer-vision-in-the-wild.github.io

CVPR 2023

2

8

38

Google AI

@GoogleAI

3 years

Learn how REVEAL, an end-to-end retrieval-augmented visual-language model that learns to use multi-source multi-modal data to answer knowledge-intensive queries, achieves state-of-the-art results on visual question answering and image caption tasks. https://t.co/NXfVeLSD2e

14

84

271

Xiuye Gu

@laoreja001

3 years

For the expected graduation date, it’s not a hard requirement. We try to prioritize candidates who are in the final year of their degree program and would be eligible for conversion opportunities after an internship (graduating by December 2024).

0

Xiuye Gu

@laoreja001

3 years

The research intern is generally intended for students who are expected to graduate within 1 year. You will work with great collaborators in Google Research and do cutting edge research & publish in top-tier venues like CVPR, ICCV & NeurIPS.

1

0

5

Xiuye Gu

@laoreja001

3 years

Looking for a research intern! The topic will be video segmentation with lots of flexibility, e.g., open-vocabulary, weakly-supervised learning, etc. If you are interested, please contact us at xiuyegu@google.com and siyang@google.com and apply via

4

25

Arsha Nagrani

@NagraniArsha

3 years

(1/N) **HIRING ALERT** Our team at @GoogleAI, led by @CordeliaSchmid, is hiring a full time Research Scientist, as well as PhD interns, to be based in Grenoble. The mission of our team is to learn high-level visual representations for video understanding ...

7

68

473

Wei-Cheng Kuo

@weichengkuo

3 years

Can we directly build upon a frozen vision and language model (VLM) to detect objects described by texts? Yes! Our open-vocabulary detector F-VLM trains simpler than closed-vocabulary counterparts, and achieves SoTA performance on LVIS. https://t.co/i7u7H1UjzX

1

7

Yin Cui

@YinCuiCV

4 years

Our OpenSeg paper is accepted to #ECCV2022! We updated the camera ready version on arXiv, including new results and reflecting recent concurrent works. We plan to release our code and model. Feel free to contact us if you have any questions!

AK

@_akhaliq

4 years

Open-Vocabulary Image Segmentation abs: https://t.co/LX6LmrkQzl OpenSeg outperforms baselines by 3.4 mIoU on PASCAL-Context (459 classes) and 2.7 mIoU on ADE-20k (847 classes)

7

11

108

Tsung-Yi Lin

@TsungYiLinCV

4 years

Our work on open-vocabulary detection is accepted by ICLR 2022! with Xiuye, Weicheng, and @YinCuiCV Have fun with our demo:

colab.research.google.com

Run, share, and edit Python notebooks

Yin Cui

@YinCuiCV

5 years

Can we use free-form text to detect any object, especially long-tailed objects? Yes! We train Mask R-CNN by distilling from CLIP to enable zero-shot detection. The model achieves higher AP compared to its supervised counterpart on rare classes. https://t.co/ZAE7UtLcv5

2

30

162