Antonia Wüst @toniwuest X Profile

Antonia Wüst

@toniwuest

Followers

205

Following

426

Media

14

Statuses

62

PhD student at AI/ML Lab @TUDarmstadt Interested in concept learning, neuro-symbolic AI and program synthesis

https://t.co/5dhK5xixVQ

Joined January 2015

Don't wanna be here? Send us removal request.

Antonia Wüst

@toniwuest

3 months

And last but not least: the spirals are still spinning, each in their own direction 🌀

0

2

Antonia Wüst

@toniwuest

3 months

@IntuitMachine

1

0

1

Antonia Wüst

@toniwuest

3 months

💻 We also added a demo of the evaluation to our GitHub repo! Check it out here:

github.com

Contribute to ml-research/bongard-in-wonderland development by creating an account on GitHub.

1

0

3

Antonia Wüst

@toniwuest

3 months

📊 Updated results are also on our webpage! Link: https://t.co/vcJaGdZmmg Curious to hear - should we evaluate other models too? 🤖

1

2

Antonia Wüst

@toniwuest

3 months

🔎 Importantly, Task 2 continues to expose inconsistencies between the solved problems in Task 1 (64) and the problems where the model can correctly classify the individual images of the problem (only 34), given the gt options (Task 2).

1

0

1

Antonia Wüst

@toniwuest

3 months

🤔 Surprisingly, even some easy problems like BP8 remain unsolved…

1

0

Antonia Wüst

@toniwuest

3 months

Can the new GPT-5 model finally solve Bongard Problems? 👉Not quite yet! Using our ICML Bongard in Wonderland setup, it solved 64/100 problems - the best score so far! 📈 However, some issues still persist ⬇️

4

6

18

Nolan Koblischke

@astro_nolan

3 months

@AsaCoopStick @idavidrein Yeah, the models are "PhD-level" in what domains? They aren't even elementary-school-level in clockwise vs counter-clockwise. https://t.co/x2Kfj9zPHr @toniwuest

1

Antonia Wüst

@toniwuest

4 months

I'll be at #ICML2025 next week presenting our recent work on VLMs and Bongard Problems! Feel free to reach out, happy to have a chat ☺️

1

4

26

Antonia Wüst

@toniwuest

4 months

📢 New LLM benchmark out, built to test logical reasoning! 🚂🧩 Evaluate your LLM on our SLR-Bench or create your own benchmark with our SLR framework 🚀 Check it out 👉

huggingface.co

lukas helff

@lukas_helff

4 months

Want to enhance the reasoning skills of today’s LLMs? 🚀 Check out SLR, our latest framework on Scalable Logical Reasoning. 🧠 Systematically train & evaluate LLMs on challenging, customizable reasoning tasks with RL & SFT. 🔗 Paper & dataset below

0

2

14

Antonia Wüst

@toniwuest

6 months

Had a fantastic time at the Women in Data Science (WiDS) Zurich conference today! I had the chance to present my work on Bongard Problems and connect with many inspiring women in data science. Grateful for the insightful talks and engaging conversations! ✨ #WiDS2025 #WomenInTech

0

20

Martin Mundt

@mundt_martin

6 months

🔥"Where is the Truth? The Risk of Getting Confounded in a Continual World" got a spotlight poster @icmlconf ! https://t.co/bYNxy9McRD -> we introduce continual confounding + the ConCon dataset, where confounders over time render continual knowledge accumulation insufficient⬇️

1

6

28

Antonia Wüst

@toniwuest

6 months

Work together with my amazing co-authors @philosotim @lukas_helff Inga Ibs @WolfStammer @devendratweetin @c_rothkopf @kerstingAIML !

0

1

Antonia Wüst

@toniwuest

6 months

Excited to share that our paper got accepted at #ICML2025!! 🎉 We challenge Vision-Language Models like OpenAI’s o1 with Bongard problems, classic visual reasoning challenges and uncover surprising shortcomings. Check out the paper: https://t.co/DEzmIEGMWj & read more below 👇

arxiv.org

Recently, newly developed Vision-Language Models (VLMs), such as OpenAI's o1, have emerged, seemingly demonstrating advanced reasoning capabilities across text and image modalities. However, the...

Antonia Wüst

@toniwuest

8 months

📢 Update: We've deepened our exploration of VLMs on Bongard Problems with more rigorous evaluations! The best-performing model (o1) we tested solved 43 out of 100 problems - progress, but still plenty of room for improvement!

2

5

17

Dominik Hintersdorf

@d_hintersdorf

7 months

🧠Foundation models are powerful—but what happens when they remember too much? Join us at #ICML2025 for our workshop on “The Impact of Memorization on Trustworthy Foundation Models” 👉 https://t.co/30iXWO5n1D Let’s talk about memorization & what it takes to build trustworthy AI!

0

9

22

Antonia Wüst

@toniwuest

8 months

Check out our paper for all the details: https://t.co/ac6WScYHA3 Work together with @philosotim @lukas_helff Inga Ibs @WolfStammer @devendratweetin @c_rothkopf @kerstingAIML ✨

0

2

7

Antonia Wüst

@toniwuest

8 months

We also identified 10 particularly challenging Bongard Problems that none of the models could solve under any setting. The challenge remains wide open! 3 examples of the challenging BPs:

1

0

4

Antonia Wüst

@toniwuest

8 months

Interestingly, success in solving the BPs (Open Question) doesn't translate to correctly categorizing individual images 👉 the sets of BPs solved in each task are not the same! This suggests that getting the right final answer doesn’t always mean genuine understanding 🤔

1

0

4

Antonia Wüst

@toniwuest

8 months

📢 Update: We've deepened our exploration of VLMs on Bongard Problems with more rigorous evaluations! The best-performing model (o1) we tested solved 43 out of 100 problems - progress, but still plenty of room for improvement!

1

6

23

Antonia Wüst

@toniwuest

11 months

Happy to present our work now, come by and say hi! ☺️

Antonia Wüst

@toniwuest

11 months

Excited to be at #NeurIPS2024 this week presenting our work Neural Concept Binder! 🤗 Stop by our poster to see how we derive expressive concept representations from unlabeled images. ⏰ Thu, Dec 12 11am–2pm 📍 East Hall A-C, #2103 See you there! 🎉✨

0

6

60