Polina Kirichenko Profile
Polina Kirichenko

@polkirichenko

Followers
4K
Following
2K
Media
57
Statuses
276

Research Scientist at FAIR @AIatMeta & visiting researcher at Princeton @VisualAILab prev. PhD at New York University 🇺🇦

New York City, NY
Joined November 2018
Don't wanna be here? Send us removal request.
@polkirichenko
Polina Kirichenko
21 days
Excited to release AbstentionBench -- our paper and benchmark on evaluating LLMs’ *abstention*: the skill of knowing when NOT to answer!. Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate!. Details and links to paper & open source code below!.🧵1/9
Tweet media one
11
81
588
@polkirichenko
Polina Kirichenko
11 days
RT @DrLaschowski: Are you a graduate student in #Ukraine interested in machine learning and neuroscience? My research lab at #UofT is now a….
0
12
0
@polkirichenko
Polina Kirichenko
20 days
RT @najoungkim: nice to see (QA)^2 included here, false presupposition Qs still going strong! every now and then i sit down and see how lon….
0
3
0
@polkirichenko
Polina Kirichenko
20 days
RT @megan_richards_: I had a great time last week at #CVPR2025's DemoDiv workshop (slides below)! I shared an overview of geographic biases….
0
2
0
@polkirichenko
Polina Kirichenko
20 days
RT @natolambert: Very excited this exists. A hill to climb on one of the traits I listed as super needed for next-gen models :) https://t.c….
0
7
0
@polkirichenko
Polina Kirichenko
21 days
RT @jieyuzhao11: Thanks for sharing our work as well! We totally agree! We also tried to mitigate this issue by mixing a small amount of da….
0
1
0
@polkirichenko
Polina Kirichenko
21 days
We release our benchmark for people to evaluate progress on abstention!.Paper link: Code link: Huge thank you to the best team ever!! 💙 Project co-leads @neurosamuel @marksibrahim and our advisor @kamalikac . 9/9.
2
0
28
@polkirichenko
Polina Kirichenko
21 days
Our results also align with concurrent work from UCLA @linxins2 @taiwei_shi @jieyuzhao11 which also observed reasoning LLMs hallucinate on unanswerable math problems!. More evidence to argue that hallucination and failure to abstain is a big challenge in.
@linxins2
Linxin Song
2 months
🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n
Tweet media one
2
1
26
@polkirichenko
Polina Kirichenko
21 days
While we find that a carefully crafted system prompt can boost abstention performance, it doesn't fundamentally address the core problem: a lack of reasoning about uncertainty!. See our paper for detailed analysis of post training stages, effect of scale, abstention vs
Tweet media one
1
0
22
@polkirichenko
Polina Kirichenko
21 days
We find that very often reasoning models hallucinate missing contexts in the reasoning chain and while sometimes they express uncertainty and the caveats within the reasoning chain, they still produce a confident final answer. We hypothesize that this behavior is coming from the
Tweet media one
1
0
24
@polkirichenko
Polina Kirichenko
21 days
Moreover, incorporating test-time scaling as in s1 @Muennighoff et al makes things even worse!.Allocating more reasoning budget generally improves accuracy and hurts abstention. 5/9
Tweet media one
1
0
25
@polkirichenko
Polina Kirichenko
21 days
Remarkably, we find that reasoning post-training hurts (!) abstention performance!.We evaluated the RLVR model from Tulu @natolambert et al, s1 and DeepSeek R1 Distill models and found consistent improvements in accuracy and drops in abstention compared to instruct models. 4/9
Tweet media one
1
0
26
@polkirichenko
Polina Kirichenko
21 days
We curate 20 uncertainty datasets in different scenarios and evaluate 20 frontier LLMs, and find that most scenarios remain challenging even for the best models!.This allows us to conduct a systematic study of what helps and hurts abstention performance. 3/9
Tweet media one
1
0
20
@polkirichenko
Polina Kirichenko
21 days
LLMs are great at solving concrete problems, but how well do they handle uncertainty? There are many questions with no direct answer!.We build a diverse benchmark spanning 6 abstention scenarios (underspecification, staleness, …) and various domains (medicine, social bias, …).
Tweet media one
2
0
28
@polkirichenko
Polina Kirichenko
21 days
RT @wregss: Had a fantastic time at #CVPR2025 and my spotlight talk on culturally representative T2I models at the DemoDiv workshop was def….
0
2
0
@polkirichenko
Polina Kirichenko
25 days
RT @MonaJalal_: So impressed by Professor @orussakovsky starting her keynote talk by highlighting her research team. We need more of this i….
0
6
0
@polkirichenko
Polina Kirichenko
25 days
RT @WiCVworkshop: 🌐 Prof. Olga Russakovsky from Princeton University delivering a powerful keynote at #WiCV @CVPR2025 on Trustworthy (and T….
0
10
0
@polkirichenko
Polina Kirichenko
25 days
RT @vlms4all: Our VLMs4All workshop is taking place today! .📅 on Thursday, June 12 .⏲️ from 9AM CDT .🏛️in Room 104E. Join us today at @CVPR….
0
4
0
@polkirichenko
Polina Kirichenko
26 days
Really insightful talk by @ang3linawang on contextual equity #CVPR2025 !
Tweet media one
2
1
20
@polkirichenko
Polina Kirichenko
26 days
Rockstar @megan_richards_ is presenting her talk at DemoDiv #CVPR2025 🔥 Come join us!
Tweet media one
0
1
10