Wei-Cheng Kuo @weichengkuo X Profile

Wei-Cheng Kuo

@weichengkuo

Followers

27

Following

20

Media

7

Statuses

12

Computer Vision / AI researcher.

Joined January 2022

Don't wanna be here? Send us removal request.

runzeli

@runzeli047

2 years

Our paper RECLIP has been accepted by the TMLR. We introduce a simple method designed to make CLIP more affordable and reproducible for the community. Authors: @runzeli047, Dahun Kim, @weichengkuo, Bir Bhanu. @GoogleDeepMind

1

Wei-Cheng Kuo

@weichengkuo

3 years

Our paper RECLIP just appeared on arxiv - your CLIP but faster! Key is to use small images for contrastive learning - it's very fast and effective. Check it out:

0

Wei-Cheng Kuo

@weichengkuo

3 years

Authors: @weichengkuo, @YinCuiCV, @laoreja001, AJ Piergiovanni, Anelia Angelova

0

1

Wei-Cheng Kuo

@weichengkuo

3 years

F-VLM works well on novel categories, cross-dataset object detection transfer, and even ego-centric videos using free-form text queries.

1

0

2

Wei-Cheng Kuo

@weichengkuo

3 years

F-VLM outperforms the best existing approach by 6.5 mask AP for novel categories on the LVIS open-vocabulary detection benchmark while being much simpler and faster to train.

0

Wei-Cheng Kuo

@weichengkuo

3 years

Authors: @weichengkuo, @YinCuiCV, @laoreja001, AJ Piergiovanni, Anelia Angelova

0

Wei-Cheng Kuo

@weichengkuo

3 years

F-VLM works well on novel categories, cross-dataset object detection transfer, and even ego-centric videos using free-form text queries.

0

Wei-Cheng Kuo

@weichengkuo

3 years

F-VLM outperforms the best existing approach by 6.5 mask AP for novel categories on the LVIS open-vocabulary detection benchmark while being much simpler and faster to train.

1

0

1

Wei-Cheng Kuo

@weichengkuo

3 years

At test time, we use the region proposals to crop out the top-level features of the VLM vision encoder and compute the VLM score per region. We combine the detection score and the VLM score for open-vocabulary detection of unseen classes.

4

0

1

Wei-Cheng Kuo

@weichengkuo

3 years

During training, F-VLM is simply a detector with the last classification layer replaced by base-category text embeddings. We only train the detector head and keep the pretrained VLM’s image and text encoder frozen.

1

0

1

Wei-Cheng Kuo

@weichengkuo

3 years

Can we directly build upon a frozen vision and language model (VLM) to detect objects described by texts? Yes! Our open-vocabulary detector F-VLM trains simpler than closed-vocabulary counterparts, and achieves SoTA performance on LVIS. https://t.co/i7u7H1UjzX

1

7

Tsung-Yi Lin

@TsungYiLinCV

4 years

Our work on open-vocabulary detection is accepted by ICLR 2022! with Xiuye, Weicheng, and @YinCuiCV Have fun with our demo:

colab.research.google.com

Run, share, and edit Python notebooks

Yin Cui

@YinCuiCV

5 years

Can we use free-form text to detect any object, especially long-tailed objects? Yes! We train Mask R-CNN by distilling from CLIP to enable zero-shot detection. The model achieves higher AP compared to its supervised counterpart on rare classes. https://t.co/ZAE7UtLcv5

2

30

162