Wei-Cheng Kuo
@weichengkuo
Followers
27
Following
20
Media
7
Statuses
12
Our paper RECLIP has been accepted by the TMLR. We introduce a simple method designed to make CLIP more affordable and reproducible for the community. Authors: @runzeli047, Dahun Kim, @weichengkuo, Bir Bhanu. @GoogleDeepMind
1
1
1
Our paper RECLIP just appeared on arxiv - your CLIP but faster! Key is to use small images for contrastive learning - it's very fast and effective. Check it out:
0
0
0
0
0
1
F-VLM works well on novel categories, cross-dataset object detection transfer, and even ego-centric videos using free-form text queries.
1
0
2
F-VLM outperforms the best existing approach by 6.5 mask AP for novel categories on the LVIS open-vocabulary detection benchmark while being much simpler and faster to train.
0
0
0
0
0
0
F-VLM works well on novel categories, cross-dataset object detection transfer, and even ego-centric videos using free-form text queries.
0
0
0
F-VLM outperforms the best existing approach by 6.5 mask AP for novel categories on the LVIS open-vocabulary detection benchmark while being much simpler and faster to train.
1
0
1
At test time, we use the region proposals to crop out the top-level features of the VLM vision encoder and compute the VLM score per region. We combine the detection score and the VLM score for open-vocabulary detection of unseen classes.
4
0
1
During training, F-VLM is simply a detector with the last classification layer replaced by base-category text embeddings. We only train the detector head and keep the pretrained VLM’s image and text encoder frozen.
1
0
1
Can we directly build upon a frozen vision and language model (VLM) to detect objects described by texts? Yes! Our open-vocabulary detector F-VLM trains simpler than closed-vocabulary counterparts, and achieves SoTA performance on LVIS. https://t.co/i7u7H1UjzX
1
1
7
Our work on open-vocabulary detection is accepted by ICLR 2022! with Xiuye, Weicheng, and @YinCuiCV Have fun with our demo:
colab.research.google.com
Run, share, and edit Python notebooks
Can we use free-form text to detect any object, especially long-tailed objects? Yes! We train Mask R-CNN by distilling from CLIP to enable zero-shot detection. The model achieves higher AP compared to its supervised counterpart on rare classes. https://t.co/ZAE7UtLcv5
2
30
162