GabiStanovsky Profile Banner
Gabriel Stanovsky Profile
Gabriel Stanovsky

@GabiStanovsky

Followers
839
Following
1K
Media
12
Statuses
282

Assistant Professor at @CseHuji

Joined August 2012
Don't wanna be here? Send us removal request.
@nlphuji
HUJI NLP
8 days
Our group closing out #EMNLP2025 in Suzhou. Until next time!
0
3
32
@EliyaHabba
Eliya Habba @EMNLP 🇨🇳
13 days
🧩 PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation A modular framework for easier, reproducible multi-prompt evaluation. 📍 Poster - Nov 6, 16:30 @ System Demos With @Dahan_Noam, @GiliLior, @GabiStanovsky Website & paper: https://t.co/oH1V6SJYlc
eliyahabba.github.io
A flexible framework for automatic generation of prompt variations for robust LLM evaluation.
1
2
8
@EliyaHabba
Eliya Habba @EMNLP 🇨🇳
11 days
Happening now! 🌈 PromptSuite @ #EMNLP2025 System Demos (16:30) Come chat about prompt robustness, evaluation, and LLM brittleness 💬
2
4
27
@nlphuji
HUJI NLP
18 days
We're proud of our team's 11 papers accepted to #EMNLP2025! See you next week in Suzhou✈️
0
11
15
@GiliLior
Gili Lior
13 days
Excited to present our paper "ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments" at #EMNLP2025 🎉 Everyone knows LLMs are prompt-sensitive, yet we still report single-prompt scores. Our work suggests a method to make evaluation statistically reliable!
2
13
52
@AdiSimhi
Adi Simhi ✈️ EMNLP
14 days
Going to @emnlpmeeting!!✈️ On November 6th, @Itay_Itzhak_, @FazlBarez, and I will present our work "Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer" at a poster session findings 2 at 12:30. w\ @GabiStanovsky, and @boknilev. https://t.co/HQRacQZwhP
3
14
89
@Dahan_Noam
Noam Dahan
17 days
I can't make it to #EMNLP2025, but @EliyaHabba and @GiliLior will present our PromptSuite🌈(demo!): a framework tackling prompt sensitivity by generating benchmark variations for any task. Try it with just a few lines of code or the web interface! https://t.co/C4VwIvAhvv
0
12
37
@AdiSimhi
Adi Simhi ✈️ EMNLP
18 days
LLMs can hallucinate due to different reasons: ❌They don't know (lack of knowledge) ❌ They "know" but are uncertain ❌They "know" and are certain New Extended version of our paper that combines our understanding of hallucination on the knowledge and certainty axis is out🧵
3
11
36
@uriberger88
Uri Berger
18 days
Heading to #EMNLP2025! 🎉 Two of our papers will be there — come say hi 👋 🖼️ Image Captioning Evaluation — Nov 5, 17:45 📄 https://t.co/TdMVA2iWSD 🕵️ Deceptive LLM Agents (Mafia Game) — Nov 5, 13:00 📄
Tweet card summary image
arxiv.org
LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are asynchronous. For example, in...
1
6
26
@yanaiela
Yanai Elazar
22 days
We're excited to host ISCOL at BIU this year!! Submit your published and ongoing work 🎉🥳
@iscol_meeting
ISCOL 2025
22 days
Save the date for ISCOL'25! The conference will be held on December 18th at Bar-Ilan University. The call for papers is now live on our website:
1
6
19
@moranmiz
Moran Mizrahi
23 days
How can we help LLMs move beyond the obvious toward generating more creative and diverse ideas? In our new TACL paper, we propose a novel approach to enhance LLM creative generation! https://t.co/AFCpQddN6j @ChenShani2 @GabiStanovsky @jurafsky @HyadataLab @stanfordnlp @nlphuji
6
26
84
@EliyaHabba
Eliya Habba @EMNLP 🇨🇳
27 days
Our 🌈 PromptSuite paper has been accepted to #EMNLP2025 🇨🇳 (System Demonstrations)! 🎉 🌈 PromptSuite is a flexible framework for generating thousands of prompt variations per instance - enabling robust, task-agnostic evaluation of LLMs. @Dahan_Noam, @GiliLior, @GabiStanovsky
1
14
33
@Itay_itzhak_
Itay Itzhak
1 month
Had a blast at CoLM! It really was as good as everyone says, congrats to the organizers 🎉 This week I’ll be in New York giving talks at NYU, Yale, and Cornell Tech. If you’re around and want to chat about LLM behavior, safety, interpretability, or just say hi - DM me!
0
6
55
@Itay_itzhak_
Itay Itzhak
2 months
🚨Spotlight update🚨 Our paper on bias origins in LLMs is a *spotlight* paper with oral presentation at CoLM 2025!✨ Honored to be among just 24 selected and super excited to present and discuss biases and finetuning limits. Who’s joining in Montreal Tuesday morning? 👀
@Itay_itzhak_
Itay Itzhak
4 months
🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc
3
8
35
@jungsoo___park
Jungsoo Park
2 months
What if LLMs can forecast their own scores on unseen benchmarks from just a task description? We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸
3
9
27
@uriberger88
Uri Berger
3 months
Happy to share that our Image Captioning evaluation survey was accepted to TACL! I will be presenting the paper @emnlpmeeting
@uriberger88
Uri Berger
1 year
1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! https://t.co/TdMVA2ip35 w\ @GabiStanovsky @AbendOmri @leafrermann >
0
4
13
@Dahan_Noam
Noam Dahan
3 months
Old news: Single-prompt eval is unreliable🤯 New news: PromptSuite🌈 - an easy way to augment your benchmark with thousands of paraphrases ➡️ robust eval, zero sweat! - Works on any dataset! - Python API + web UI @EliyaHabba, @GiliLior, @GabiStanovsky https://t.co/C4VwIvzJFX
eliyahabba.github.io
A flexible framework for automatic generation of prompt variations for robust LLM evaluation.
2
15
62
@AdiSimhi
Adi Simhi ✈️ EMNLP
3 months
Very pleased that "Trust me I'm Wrong" was accepted to @emnlpmeeting findings! Trust me I'm Wrong shows that LLMs can hallucinate with high certainty even when they know the correct answer! Check our latest work with @Itay_itzhak_, @FazlBarez, @GabiStanovsky, and @boknilev.
5
14
113
@verena_rieser
Verena Rieser
4 months
How can we evaluate the real world impact of generative AI? Great panel GEM2 workshop #ACL2025NLP 🇦🇹
1
6
33
@EnricoSantus
Enrico Santus
4 months
I swear I warned all the romantics in the room — especially after the #Coldplay scandal! 😄🎶 If you were there (or wish you had been), tag yourself and your friends in the comments 👇 Bye bye from the #Gem organizers and speakers! #ACL2025 #ACL2025NLP #GEM2 #LLMs #NLP #Vienna
4
3
17