Wei-Cheng Kuo Profile
Wei-Cheng Kuo

@weichengkuo

Followers
27
Following
20
Media
7
Statuses
12

Computer Vision / AI researcher.

Joined January 2022
Don't wanna be here? Send us removal request.
@runzeli047
runzeli
2 years
Our paper RECLIP has been accepted by the TMLR. We introduce a simple method designed to make CLIP more affordable and reproducible for the community. Authors: @runzeli047, Dahun Kim, @weichengkuo, Bir Bhanu. @GoogleDeepMind
1
1
1
@weichengkuo
Wei-Cheng Kuo
3 years
Our paper RECLIP just appeared on arxiv - your CLIP but faster! Key is to use small images for contrastive learning - it's very fast and effective. Check it out:
0
0
0
@weichengkuo
Wei-Cheng Kuo
3 years
Authors: @weichengkuo, @YinCuiCV, @laoreja001, AJ Piergiovanni, Anelia Angelova
0
0
1
@weichengkuo
Wei-Cheng Kuo
3 years
F-VLM works well on novel categories, cross-dataset object detection transfer, and even ego-centric videos using free-form text queries.
1
0
2
@weichengkuo
Wei-Cheng Kuo
3 years
F-VLM outperforms the best existing approach by 6.5 mask AP for novel categories on the LVIS open-vocabulary detection benchmark while being much simpler and faster to train.
0
0
0
@weichengkuo
Wei-Cheng Kuo
3 years
Authors: @weichengkuo, @YinCuiCV, @laoreja001, AJ Piergiovanni, Anelia Angelova
0
0
0
@weichengkuo
Wei-Cheng Kuo
3 years
F-VLM works well on novel categories, cross-dataset object detection transfer, and even ego-centric videos using free-form text queries.
0
0
0
@weichengkuo
Wei-Cheng Kuo
3 years
F-VLM outperforms the best existing approach by 6.5 mask AP for novel categories on the LVIS open-vocabulary detection benchmark while being much simpler and faster to train.
1
0
1
@weichengkuo
Wei-Cheng Kuo
3 years
At test time, we use the region proposals to crop out the top-level features of the VLM vision encoder and compute the VLM score per region. We combine the detection score and the VLM score for open-vocabulary detection of unseen classes.
4
0
1
@weichengkuo
Wei-Cheng Kuo
3 years
During training, F-VLM is simply a detector with the last classification layer replaced by base-category text embeddings. We only train the detector head and keep the pretrained VLM’s image and text encoder frozen.
1
0
1
@weichengkuo
Wei-Cheng Kuo
3 years
Can we directly build upon a frozen vision and language model (VLM) to detect objects described by texts? Yes! Our open-vocabulary detector F-VLM trains simpler than closed-vocabulary counterparts, and achieves SoTA performance on LVIS. https://t.co/i7u7H1UjzX
1
1
7
@TsungYiLinCV
Tsung-Yi Lin
4 years
Our work on open-vocabulary detection is accepted by ICLR 2022! with Xiuye, Weicheng, and @YinCuiCV Have fun with our demo:
Tweet card summary image
colab.research.google.com
Run, share, and edit Python notebooks
@YinCuiCV
Yin Cui
5 years
Can we use free-form text to detect any object, especially long-tailed objects? Yes! We train Mask R-CNN by distilling from CLIP to enable zero-shot detection. The model achieves higher AP compared to its supervised counterpart on rare classes. https://t.co/ZAE7UtLcv5
2
30
162