@RanjayKrishna
Ranjay Krishna
1 year
There are so many vision-language models: OpenAI’s CLIP, Meta’s FLAVA, Salesforce’s ALBEF, etc. Our #CVPR2023 ⭐️ highlight ⭐️ paper finds that none of them show sufficient compositional reasoning capacity. Since perception and language are both compositional, we have work to do
@zixianma02
Zixian Ma @ CVPR2024
1 year
Have vision-language models achieved human-level compositional reasoning? Our research suggests: not quite yet. We’re excited to present CREPE – a large-scale Compositional REPresentation Evaluation benchmark for vision-language models – as a 🌟highlight🌟at #CVPR2023 . 🧵1/7
3
14
68
0
1
27