
Polina Kirichenko
@polkirichenko
Followers
4K
Following
2K
Media
57
Statuses
276
Research Scientist at FAIR @AIatMeta & visiting researcher at Princeton @VisualAILab prev. PhD at New York University 🇺🇦
New York City, NY
Joined November 2018
RT @DrLaschowski: Are you a graduate student in #Ukraine interested in machine learning and neuroscience? My research lab at #UofT is now a….
0
12
0
RT @najoungkim: nice to see (QA)^2 included here, false presupposition Qs still going strong! every now and then i sit down and see how lon….
0
3
0
RT @megan_richards_: I had a great time last week at #CVPR2025's DemoDiv workshop (slides below)! I shared an overview of geographic biases….
0
2
0
RT @natolambert: Very excited this exists. A hill to climb on one of the traits I listed as super needed for next-gen models :) https://t.c….
0
7
0
RT @jieyuzhao11: Thanks for sharing our work as well! We totally agree! We also tried to mitigate this issue by mixing a small amount of da….
0
1
0
We release our benchmark for people to evaluate progress on abstention!.Paper link: Code link: Huge thank you to the best team ever!! 💙 Project co-leads @neurosamuel @marksibrahim and our advisor @kamalikac . 9/9.
2
0
28
Our results also align with concurrent work from UCLA @linxins2 @taiwei_shi @jieyuzhao11 which also observed reasoning LLMs hallucinate on unanswerable math problems!. More evidence to argue that hallucination and failure to abstain is a big challenge in.
🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n
2
1
26
Moreover, incorporating test-time scaling as in s1 @Muennighoff et al makes things even worse!.Allocating more reasoning budget generally improves accuracy and hurts abstention. 5/9
1
0
25
Remarkably, we find that reasoning post-training hurts (!) abstention performance!.We evaluated the RLVR model from Tulu @natolambert et al, s1 and DeepSeek R1 Distill models and found consistent improvements in accuracy and drops in abstention compared to instruct models. 4/9
1
0
26
RT @MonaJalal_: So impressed by Professor @orussakovsky starting her keynote talk by highlighting her research team. We need more of this i….
0
6
0
RT @WiCVworkshop: 🌐 Prof. Olga Russakovsky from Princeton University delivering a powerful keynote at #WiCV @CVPR2025 on Trustworthy (and T….
0
10
0