Jiaang Li
@jiaangli
Followers
115
Following
197
Media
11
Statuses
56
Ph.D. student at @BelongieLab @AiCentreDK @ELLISforEurope @DIKU_Institut | Natural Language Processing - Computer Vision
Denmark
Joined November 2021
New Preprint 🎉: "Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory". We review recent works on culture in VLMs and argue for deeper grounding in cultural theory to enable more inclusive evaluations. Paper 🔗: https://t.co/9AoRHTFG58
1
10
88
Great collaboration with @yfyuan775 @Wenyan62 @maliannejadi @daniel_hers, Anders Søgaard, @licwu @Wenxuan__Zhang @pliang279 @ydeng_dandy @SergeBelongie ! 🔗More here: Project Page: https://t.co/kKxjqX83Vs Code: https://t.co/wFB9N5gz92 Dataset:
huggingface.co
0
0
0
📊Our experiments demonstrate that even lightweight VLMs, when augmented with culturally relevant retrievals, outperform their non-augmented counterparts and even surpass the next larger model tier, achieving at least a 3.2% improvement in cVQA and 6.2% in cIC.
1
0
0
🛠Culture-Aware Contrastive Learning We propose Culture-aware Contrastive Learning, a supervised learning framework compatible with both CLIP and SigLIP architectures. Fine-tuning with CAC can help models better capture culturally significant content.
1
0
0
📚 Dataset Construction RAVENEA integrates 1,800+ images, 2,000+ culture-related questions, 500+ human captions, and 10,000+ human-ranked Wikipedia documents to support two key tasks: 🎯Culture-focused Visual Question Answering (cVQA) 📝Culture-informed Image Captioning (cIC)
1
0
0
🚀New Preprint Alert 🚀 Can Multimodal Retrieval Enhance Cultural Awareness in Vision-Language Models? Excited to introduce RAVENEA, a new benchmark aimed at evaluating cultural understanding in VLMs through RAG.
1
5
8
🙋🏻♂️ I'm actively recruiting for multiple fully-funded positions in my group at SUTD! Topic: NLP / LLM / Multimodal Openings: 1 x Postdoc: start date is flexible 1 x PhD: earliest batch is Spring 2026 2-3 x Visiting students: fully funded for 6-12 months. Open to Bachelor /
2
18
91
Postdoc opening for research on Vision-Language Modeling at MBZUAI with me and @thamar_solorio. 1 year position with possibility for extension. Please share and get in touch if you're interested!
0
3
15
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia 🎧 📢 Excited to share SeaLLMs-Audio, the multimodal (audio) extension of the SeaLLMs family. Following the release of SeaLLMs-v3 last year, we've focused on expanding to audio/speech capabilities, addressing the
0
13
34
I have an opening for fully-funded 6-month visiting PhD student at SMU. Time: August 2025 - March 2026 Eligibility: PhD students from universities in Europe, North/South America, South-East Asia. Topic: NLP/LLM Email me for more details if you are interested~
4
28
176
Forget just thinking in words. 🚀 New Era of Multimodal Reasoning🚨 🔍 Imagine While Reasoning in Space with MVoT Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.
14
169
747
I will present "Do Vision and Language Models Share Concepts? A Vector Space Alignment Study" in person at #EMNLP2024😋 Feel free to drop by! ⏰ Nov, 12th (Tue) 16:00, In-Person Oral Session C
1
7
52
🍗🍗I will present FoodieQA in person at #EMNLP2024😋😋 Looking forward to meeting old and new friends! Feel free to drop by! (and have some snacks) ⏰ Nov, 13th (Wed) 16:00, In-Person Poster Session E (Riverfront Hall) I'm also on the job market and would be happy to chat :)
2
12
63
📢📢The article is now in print: https://t.co/s1ILCb77mK
direct.mit.edu
Abstract. Large-scale pretrained language models (LMs) are said to “lack the ability to connect utterances to the world” (Bender and Koller, 2020), because they do not have “mental models of the...
Do Vision and Language Models Share Concepts? 🤔👀🧠 We present an empirical evaluation and find that language models partially converge towards representations isomorphic to those of vision models. 📝: https://t.co/y6UDT9O0MP 🧑💻: https://t.co/Tm6dVKxaY5 🧵(1/8)
0
2
17
We welcome applications for a PhD position in CV/ML (fine-grained analysis of multimodal data, 2D/3D generative models, misinformation detection, self-supervised learning) @DIKU_Institut @AiCentreDK Apply though the ELLIS portal 💻 Deadline 15-Nov-2024 🗓️
The #ELLISPhD application portal is now open! Apply to top #AI labs & supervisors in Europe with a single application, and choose from different areas & tracks. The call for applications: https://t.co/og6xzr14tj Deadline: 15 November 2024 #PhD #PhDProgram #MachineLearning #ML
0
6
28
📢📢 Happy to share that our paper 🍜🍤"FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture" is accepted to EMNLP main! Dataset and code are all released! 🍗🍗🍗 https://t.co/zUraNFYOtN!
https://t.co/g4x5kuKnCD Kudos to all my lovely coauthors!
huggingface.co
8
15
133
LoQT accepted at @NeurIPSConf #NeurIPS 🎉 Thanks to @mabeto5p @SergeBelongie @mjkastoryano and @vesteinns for the collaboration! Links to preprint and code ⬇️
Ever wanted to train your own 13B Llama2 model from scratch on a 24GB GPU? Or fine-tune one without compromising performance compared to full training? 🦙 You now can, with LoQT: Low Rank Adapters for Quantized Training! https://t.co/doCJybAxP1 1/4
2
4
24
📢📣Happy to share our new benchmark paper: ‘Unlocking Markets: A Multilingual Benchmark to Cross-Market Question Answering’ accepted to #EMNLP main! Thanks to my amazing collaborators @ydeng_dandy, Anders Søgaard, @maliannejadi ❤️ Looking forward to presenting in Miami🏖️🏝️
2
10
64
Our paper on understanding variability in text-to-image models was accepted at #EMNLP2024 main track! Lots of thanks to my collaborators @crystina_z @yaolu_nlp @Wenyan62 @Ulienida and mentors @lintool Pontus @ferhanture. Check out
w1kp.com
Project page for the W1KP paper.
1
14
25
Will be presenting “understanding retirement robustness for retrieval-augmented image captioning” at: Poster In-Person session 2: Aug 12, 2pm Oral: Aug 13, multimodal session, 4:45pm Feel free to drop by👋 if you are interested!
1
1
12