Gabriel Stanovsky @GabiStanovsky X Profile

Gabriel Stanovsky

@GabiStanovsky

Followers

839

Following

1K

Media

12

Statuses

282

Assistant Professor at @CseHuji

https://t.co/dGiH3SHP6V

Joined August 2012

Don't wanna be here? Send us removal request.

HUJI NLP

@nlphuji

8 days

Our group closing out #EMNLP2025 in Suzhou. Until next time!

0

3

32

Eliya Habba @EMNLP 🇨🇳

@EliyaHabba

13 days

🧩 PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation A modular framework for easier, reproducible multi-prompt evaluation. 📍 Poster - Nov 6, 16:30 @ System Demos With @Dahan_Noam, @GiliLior, @GabiStanovsky Website & paper: https://t.co/oH1V6SJYlc

eliyahabba.github.io

A flexible framework for automatic generation of prompt variations for robust LLM evaluation.

1

2

8

Eliya Habba @EMNLP 🇨🇳

@EliyaHabba

11 days

Happening now! 🌈 PromptSuite @ #EMNLP2025 System Demos (16:30) Come chat about prompt robustness, evaluation, and LLM brittleness 💬

2

4

27

HUJI NLP

@nlphuji

18 days

We're proud of our team's 11 papers accepted to #EMNLP2025! See you next week in Suzhou✈️

0

11

15

Gili Lior

@GiliLior

13 days

Excited to present our paper "ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments" at #EMNLP2025 🎉 Everyone knows LLMs are prompt-sensitive, yet we still report single-prompt scores. Our work suggests a method to make evaluation statistically reliable!

2

13

52

Adi Simhi ✈️ EMNLP

@AdiSimhi

14 days

Going to @emnlpmeeting!!✈️ On November 6th, @Itay_Itzhak_, @FazlBarez, and I will present our work "Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer" at a poster session findings 2 at 12:30. w\ @GabiStanovsky, and @boknilev. https://t.co/HQRacQZwhP

3

14

89

Noam Dahan

@Dahan_Noam

17 days

I can't make it to #EMNLP2025, but @EliyaHabba and @GiliLior will present our PromptSuite🌈(demo!): a framework tackling prompt sensitivity by generating benchmark variations for any task. Try it with just a few lines of code or the web interface! https://t.co/C4VwIvAhvv

0

12

37

Adi Simhi ✈️ EMNLP

@AdiSimhi

18 days

LLMs can hallucinate due to different reasons: ❌They don't know (lack of knowledge) ❌ They "know" but are uncertain ❌They "know" and are certain New Extended version of our paper that combines our understanding of hallucination on the knowledge and certainty axis is out🧵

3

11

36

Uri Berger

@uriberger88

18 days

Heading to #EMNLP2025! 🎉 Two of our papers will be there — come say hi 👋 🖼️ Image Captioning Evaluation — Nov 5, 17:45 📄 https://t.co/TdMVA2iWSD 🕵️ Deceptive LLM Agents (Mafia Game) — Nov 5, 13:00 📄

arxiv.org

LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are asynchronous. For example, in...

1

6

26

Yanai Elazar

@yanaiela

22 days

We're excited to host ISCOL at BIU this year!! Submit your published and ongoing work 🎉🥳

ISCOL 2025

@iscol_meeting

22 days

Save the date for ISCOL'25! The conference will be held on December 18th at Bar-Ilan University. The call for papers is now live on our website:

1

6

19

Moran Mizrahi

@moranmiz

23 days

How can we help LLMs move beyond the obvious toward generating more creative and diverse ideas? In our new TACL paper, we propose a novel approach to enhance LLM creative generation! https://t.co/AFCpQddN6j @ChenShani2 @GabiStanovsky @jurafsky @HyadataLab @stanfordnlp @nlphuji

6

26

84

Eliya Habba @EMNLP 🇨🇳

@EliyaHabba

27 days

Our 🌈 PromptSuite paper has been accepted to #EMNLP2025 🇨🇳 (System Demonstrations)! 🎉 🌈 PromptSuite is a flexible framework for generating thousands of prompt variations per instance - enabling robust, task-agnostic evaluation of LLMs. @Dahan_Noam, @GiliLior, @GabiStanovsky

1

14

33

Itay Itzhak

@Itay_itzhak_

1 month

Had a blast at CoLM! It really was as good as everyone says, congrats to the organizers 🎉 This week I’ll be in New York giving talks at NYU, Yale, and Cornell Tech. If you’re around and want to chat about LLM behavior, safety, interpretability, or just say hi - DM me!

0

6

55

Itay Itzhak

@Itay_itzhak_

2 months

🚨Spotlight update🚨 Our paper on bias origins in LLMs is a *spotlight* paper with oral presentation at CoLM 2025!✨ Honored to be among just 24 selected and super excited to present and discuss biases and finetuning limits. Who’s joining in Montreal Tuesday morning? 👀

Itay Itzhak

@Itay_itzhak_

4 months

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

3

8

35

Jungsoo Park

@jungsoo___park

2 months

What if LLMs can forecast their own scores on unseen benchmarks from just a task description? We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸

3

9

27

Uri Berger

@uriberger88

3 months

Happy to share that our Image Captioning evaluation survey was accepted to TACL! I will be presenting the paper @emnlpmeeting

Uri Berger

@uriberger88

1 year

1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! https://t.co/TdMVA2ip35 w\ @GabiStanovsky @AbendOmri @leafrermann >

0

4

13

Noam Dahan

@Dahan_Noam

3 months

Old news: Single-prompt eval is unreliable🤯 New news: PromptSuite🌈 - an easy way to augment your benchmark with thousands of paraphrases ➡️ robust eval, zero sweat! - Works on any dataset! - Python API + web UI @EliyaHabba, @GiliLior, @GabiStanovsky https://t.co/C4VwIvzJFX

eliyahabba.github.io

A flexible framework for automatic generation of prompt variations for robust LLM evaluation.

2

15

62

Adi Simhi ✈️ EMNLP

@AdiSimhi

3 months

Very pleased that "Trust me I'm Wrong" was accepted to @emnlpmeeting findings! Trust me I'm Wrong shows that LLMs can hallucinate with high certainty even when they know the correct answer! Check our latest work with @Itay_itzhak_, @FazlBarez, @GabiStanovsky, and @boknilev.

5

14

113

Verena Rieser

@verena_rieser

4 months

How can we evaluate the real world impact of generative AI? Great panel GEM2 workshop #ACL2025NLP 🇦🇹

1

6

33

Enrico Santus

@EnricoSantus

4 months

I swear I warned all the romantics in the room — especially after the #Coldplay scandal! 😄🎶 If you were there (or wish you had been), tag yourself and your friends in the comments 👇 Bye bye from the #Gem organizers and speakers! #ACL2025 #ACL2025NLP #GEM2 #LLMs #NLP #Vienna

4

3

17