Antonia Wüst
@toniwuest
Followers
205
Following
426
Media
14
Statuses
62
PhD student at AI/ML Lab @TUDarmstadt Interested in concept learning, neuro-symbolic AI and program synthesis
Joined January 2015
And last but not least: the spirals are still spinning, each in their own direction 🌀
0
0
2
💻 We also added a demo of the evaluation to our GitHub repo! Check it out here:
github.com
Contribute to ml-research/bongard-in-wonderland development by creating an account on GitHub.
1
0
3
📊 Updated results are also on our webpage! Link: https://t.co/vcJaGdZmmg Curious to hear - should we evaluate other models too? 🤖
1
1
2
🔎 Importantly, Task 2 continues to expose inconsistencies between the solved problems in Task 1 (64) and the problems where the model can correctly classify the individual images of the problem (only 34), given the gt options (Task 2).
1
0
1
🤔 Surprisingly, even some easy problems like BP8 remain unsolved…
1
0
0
Can the new GPT-5 model finally solve Bongard Problems? 👉Not quite yet! Using our ICML Bongard in Wonderland setup, it solved 64/100 problems - the best score so far! 📈 However, some issues still persist ⬇️
4
6
18
@AsaCoopStick @idavidrein Yeah, the models are "PhD-level" in what domains? They aren't even elementary-school-level in clockwise vs counter-clockwise. https://t.co/x2Kfj9zPHr
@toniwuest
1
1
1
I'll be at #ICML2025 next week presenting our recent work on VLMs and Bongard Problems! Feel free to reach out, happy to have a chat ☺️
1
4
26
📢 New LLM benchmark out, built to test logical reasoning! 🚂🧩 Evaluate your LLM on our SLR-Bench or create your own benchmark with our SLR framework 🚀 Check it out 👉
huggingface.co
Want to enhance the reasoning skills of today’s LLMs? 🚀 Check out SLR, our latest framework on Scalable Logical Reasoning. 🧠 Systematically train & evaluate LLMs on challenging, customizable reasoning tasks with RL & SFT. 🔗 Paper & dataset below
0
2
14
Had a fantastic time at the Women in Data Science (WiDS) Zurich conference today! I had the chance to present my work on Bongard Problems and connect with many inspiring women in data science. Grateful for the insightful talks and engaging conversations! ✨ #WiDS2025 #WomenInTech
0
0
20
🔥"Where is the Truth? The Risk of Getting Confounded in a Continual World" got a spotlight poster @icmlconf ! https://t.co/bYNxy9McRD -> we introduce continual confounding + the ConCon dataset, where confounders over time render continual knowledge accumulation insufficient⬇️
1
6
28
Work together with my amazing co-authors @philosotim @lukas_helff Inga Ibs @WolfStammer @devendratweetin @c_rothkopf @kerstingAIML !
0
0
1
Excited to share that our paper got accepted at #ICML2025!! 🎉 We challenge Vision-Language Models like OpenAI’s o1 with Bongard problems, classic visual reasoning challenges and uncover surprising shortcomings. Check out the paper: https://t.co/DEzmIEGMWj & read more below 👇
arxiv.org
Recently, newly developed Vision-Language Models (VLMs), such as OpenAI's o1, have emerged, seemingly demonstrating advanced reasoning capabilities across text and image modalities. However, the...
📢 Update: We've deepened our exploration of VLMs on Bongard Problems with more rigorous evaluations! The best-performing model (o1) we tested solved 43 out of 100 problems - progress, but still plenty of room for improvement!
2
5
17
🧠Foundation models are powerful—but what happens when they remember too much? Join us at #ICML2025 for our workshop on “The Impact of Memorization on Trustworthy Foundation Models” 👉 https://t.co/30iXWO5n1D Let’s talk about memorization & what it takes to build trustworthy AI!
0
9
22
Check out our paper for all the details: https://t.co/ac6WScYHA3 Work together with @philosotim @lukas_helff Inga Ibs @WolfStammer @devendratweetin @c_rothkopf @kerstingAIML ✨
0
2
7
We also identified 10 particularly challenging Bongard Problems that none of the models could solve under any setting. The challenge remains wide open! 3 examples of the challenging BPs:
1
0
4
Interestingly, success in solving the BPs (Open Question) doesn't translate to correctly categorizing individual images 👉 the sets of BPs solved in each task are not the same! This suggests that getting the right final answer doesn’t always mean genuine understanding 🤔
1
0
4
📢 Update: We've deepened our exploration of VLMs on Bongard Problems with more rigorous evaluations! The best-performing model (o1) we tested solved 43 out of 100 problems - progress, but still plenty of room for improvement!
1
6
23
Happy to present our work now, come by and say hi! ☺️
Excited to be at #NeurIPS2024 this week presenting our work Neural Concept Binder! 🤗 Stop by our poster to see how we derive expressive concept representations from unlabeled images. ⏰ Thu, Dec 12 11am–2pm 📍 East Hall A-C, #2103 See you there! 🎉✨
0
6
60