An Vo @an_vo12 X Profile

An Vo

@an_vo12

Followers

220

Following

772

Media

8

Statuses

85

MS student @ KAIST | Interest: LLMs/VLMs, Trustworthy AI

https://t.co/ouFusQBmIa

Daejeon, Republic of Korea

Joined July 2015

Don't wanna be here? Send us removal request.

An Vo

@an_vo12

3 months

🚨 Our latest work shows that SOTA VLMs (o3, o4-mini, Sonnet, Gemini Pro) fail at counting legs due to bias⁉️ See simple cases where VLMs get it wrong, no matter how you prompt them. 🧪 Think your VLM can do better? Try it yourself here: https://t.co/EDJdF3Vmpy 1/n #ICML2025

9

41

303

Jia-Bin Huang

@jbhuang0604

13 days

How to Revise an Academic Poster? Designing a clear poster is an essential skill for students. We recently revised one for a paper presented last week. BUT, instead of just one student benefiting from the process, I’m sharing it and hope some will find it helpful. 🧵

5

31

181

Abhilasha Ravichander

@lasha_nlp

15 days

It is PhD application season again 🍂 For those looking to do a PhD in AI, these are some useful resources 🤖: 1. Examples of statements of purpose (SOPs) for computer science PhD programs: https://t.co/Stz53ZiREM [1/4]

cs-sop.notion.site

cs-sop.org is a platform intended to help CS PhD applicants. It hosts a database of example statements of purpose (SoP) shared by previous applicants to Computer Science PhD programs.

6

77

386

Ahmad Beirami

@abeirami

26 days

My thoughts on the broken state of AI conference reviewing: Years ago, when I was in graduate school and a postdoc in Information Theory, I always felt fortunate to be invited to review for IEEE Transactions on Information Theory or IEEE Transactions on Signal Processing. I felt

15

19

211

Anh Totti Nguyen

@anh_ng8

28 days

@yuyinzhou_cs @NeurIPSConf We have a paper in the same situation. AC: Yes! PC: No no. @NeurIPSConf please consider the whether 1st author is a student and whether this would be their first top-tier paper BEFORE making such a cut. More healthy for junior researchers. OR use a Findings track.

0

2

10

Kevin K. Yang 楊凱筌

@KevinKaichuang

28 days

Out of 7 papers in my NeurIPS Benchmarks and Datasets Track area, the PCs overruled my recommendation on 3?!

6

5

109

Charles 🎉 Frye is in STHLM

@charles_irl

1 month

The ICLR 2026 deadline is ten days away. But you just found a bug in your evals, so now you need to re-run all your ablations. That's hundreds of experiments, and you need them done ASAP. @modal's got you. Introducing our ICLR 2026 compute grant program.

17

35

531

Jason Kwon

@jasonkwon

1 month

A special day in Seoul as we officially launch OpenAI Korea. With strong government support and huge growth in ChatGPT use (up 4x in the past year), Korea is entering a new chapter in its AI journey and we want to be a true partner in Korea’s AI transformation.

26

42

524

Akari Asai

@AkariAsai

1 month

Grad school season reminder: many CS departments run student-led pre-application mentorship programs for prospective PhD applicants (due Oct. You can get feedback from current PhD students! Eg - UW’s CSE PAMS: https://t.co/RYw4mbD47h - MIT EECS GAAP: https://t.co/piD6hkmHzq 🧵

cs.washington.edu

Pre-Application Mentorship Service (PAMS)

10

42

265

An Vo

@an_vo12

1 month

This blog makes me wonder about the OPPOSITE problem: 👉 Can we make LLMs give uniform random answers when asked (e.g., “randomly pick 0-9”)? So far, our work ( https://t.co/f7R3nLN57i) shown that we can hack it with multi-turn, but I’d love to see this activated in single-turn.

Thinking Machines

@thinkymachines

1 month

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

1

0

3

Saining Xie

@sainingxie

2 months

this isn’t just a modeling problem. it’s also a benchmarking problem. spurious correlations are always a pain, but in multimodal llms they become a particularly tough battle. On one hand, you want to leverage the language prior to enable better generalization; on the other, that

Tairan He

@TairanHe99

2 months

I couldn’t believe GPT-5 could make this mistake until @ziqiao_ma pointed it out to me. Highly recommend this paper ( https://t.co/PoMp4GggEm) on vision-centric evaluation of multimodal LLMs from @sainingxie — now imagine the same rigor applied to VLAs.

7

24

243

(((ل()(ل() 'yoav))))👾

@yoavgo

2 months

beautiful adversarial dataset playing exactly on the soft-spot of VLMs.

An Vo

@an_vo12

3 months

🚨 Our latest work shows that SOTA VLMs (o3, o4-mini, Sonnet, Gemini Pro) fail at counting legs due to bias⁉️ See simple cases where VLMs get it wrong, no matter how you prompt them. 🧪 Think your VLM can do better? Try it yourself here: https://t.co/EDJdF3Vmpy 1/n #ICML2025

5

20

279

Gary Marcus

@GaryMarcus

2 months

Check it out. Almost everyone in the major media is missing the real story around GPT-5. The real story is about how so many people (even big fans of OpenAI) were disappointed. And it’s about how that may well spell the end of scaling mania. And it’s about how the premature

43

74

486

Gary Marcus

@GaryMarcus

2 months

and I thought GPT-5 was supposed to be some multimodal revolution that too turns out to be bullshit.

Anh Totti Nguyen

@anh_ng8

2 months

#GPT5 is STILL having a severe confirmation bias like prev SOTA models! 😜 Try yourselves (images, prompts avail in 1 click): https://t.co/S317wqrlju It's fast to test for such biases in images. Similar biases should still exist in non-image domains as well...

18

11

61

Lucas Beyer (bl16)

@giffmana

2 months

Page: https://t.co/luRR9qyWrX Thread by author: https://t.co/zVjEqtWz8U

An Vo

@an_vo12

3 months

🚨 Our latest work shows that SOTA VLMs (o3, o4-mini, Sonnet, Gemini Pro) fail at counting legs due to bias⁉️ See simple cases where VLMs get it wrong, no matter how you prompt them. 🧪 Think your VLM can do better? Try it yourself here: https://t.co/EDJdF3Vmpy 1/n #ICML2025

2

5

65

Lucas Beyer (bl16)

@giffmana

2 months

Oh wow, this VLM benchmark is pure evil, and I love it! "Vision Language Models are Biased" by @an_vo12, @taesiri, @anh_ng8, etal. Also really good idea to have one-click copy-paste of images and prompts, makes trying it super easy.

32

75

942

Anh Totti Nguyen

@anh_ng8

2 months

#GPT5 is STILL having a severe confirmation bias like prev SOTA models! 😜 Try yourselves (images, prompts avail in 1 click): https://t.co/S317wqrlju It's fast to test for such biases in images. Similar biases should still exist in non-image domains as well...

11

14

121

An Vo

@an_vo12

3 months

Shaping results into a convincing narrative in just a few days is incredibly tough and intense 🧠 Honestly, it feels like writing another paper in 6 days ⏳ even harder than starting from scratch for a general audience.

Sharon Y. Li

@SharonYixuanLi

3 months

I have deep respect for students grinding on NeurIPS rebuttal these days: - running a brutal amount of experiments - shaping them into a polished narrative - all under a tight timeline It’s an art + endurance test.

0

1

Anh Totti Nguyen

@anh_ng8

3 months

@grok count the legs

3

1

6

An Vo

@an_vo12

3 months

https://t.co/1z57IuxUE6

0

An Vo

@an_vo12

3 months

Thanks @Cohere_Labs for sharing our work! 🙌 If you’re attending #ICML2025, come visit our B-score poster to chat more: 🗓️ Thursday, July 17 | ⏰ 4:30-7:00 PM 📍 East Exhibition Hall A-B, Poster #E-1004

Cohere Labs

@Cohere_Labs

3 months

Supported by one of our grants, @an_vo12, Mohammad Reza Taesiri, and @anh_ng8 from @kaist_ai, tackled bias in LLMs. Their research shows that LLMs exhibit fewer biases when they can see their previous answers, leading to the development of the B-score metric.

1

11