Gal Yona @_galyo X Profile

Gal Yona

@_galyo

Followers

492

Following

1K

Media

31

Statuses

294

Research scientist @googleai, previously CS PhD @weizmannscience

Joined October 2009

Don't wanna be here? Send us removal request.

Gal Yona

@_galyo

15 days

really wish more talks in CS/ML were like this (surely seminars, maybe confs?). It's quite obv that over a uniform sample of accepted Neurips papers, Pr[results will be insightful] << Pr[hearing about the "behind the scenes" of the project will be insightful] (for me personally).

Itai Yanai

@ItaiYanai

15 days

Imagine going to a seminar and listening to the speaker also talk about how the big idea happened. Join us Sept. 22 for the first talk in the "Night Science Seminar Series" where I'll talk about cellular plasticity and also discuss how the idea came about!.

0

5

Gal Yona

@_galyo

18 days

RT @NitCal: 🥳🥳.Happy to share that we have three papers accepted to EMNLP 2025 🇨🇳 (2 main, 1 findings)! . What makes this special is that a….

0

13

0

Gal Yona

@_galyo

2 months

+100 for this (surprisingly short) take! "writing the paper" is not something that happens at the END of a research project. it's an integral part of it. personally, blindly offloading that part to an LLM would be the surest way to hurt the quality of my research.

Bethlehem Tekola, PhD

@Bethlehemtekola

2 months

"writing is not only about reporting results; it also provides a tool to uncover new thoughts and ideas. Writing compels us to think"

0

1

10

Gal Yona

@_galyo

3 months

RT @AFarfuri: 🪻

0

196

0

Gal Yona

@_galyo

3 months

RT @sh_reya: new blogpost on writing in the ~glorious~ age of LLMs

0

169

0

Gal Yona

@_galyo

3 months

new work by @pybeebee shows that LLMs still struggle to faithfully express their uncertainty in words, but cool to see that meta cognitive inspired prompting can go a long way. looking forward to seeing more positive results on this fundamental problem!.

Gabrielle Kaili-May Liu

@pybeebee

3 months

🔥 Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs🔥. How can we make LLMs talk about uncertainty in a way that truly reflects what they internally "know"?.Check out our new preprint to find out!.Details in 🧵(1/n):

0

1

2

Gal Yona

@_galyo

4 months

RT @JoshBreiner: מצב המשטרה: השתמשה בצ'ט GPT שהמציא עבורה חוק חדש על מנת לנצח בהליך להחרמת פלאפון בדיון בבית משפט השלום בחדרה. השופט היה המ….

0

188

0

Gal Yona

@_galyo

4 months

RT @yoavgo: we write too much. more than we can read, and many small incremental things. i think there should be some mechanism to restrict….

0

30

0

Gal Yona

@_galyo

4 months

RT @doodlestein: @sama the single biggest thing you could do for safety/alignment is to put a massive emphasis in the RL feedback loop on b….

0

4

0

Gal Yona

@_galyo

5 months

This was a great 30-minute conceptual read. It neatly ties together classic RL, LLMs of the past few years, and where agents are headed next. Honestly, I find the future of agents interacting w the world with less human mediation ("experiencing") both exciting and terrifying.

Richard Sutton

@RichardSSutton

5 months

@dsivakumar The short paper "Welcome to the Era of Experience" is literally just released, like this week. Ultimately it will become a chapter in the book 'Designing an Intelligence' edited by George Konidaris and published by MIT Press.

0

3

Gal Yona

@_galyo

5 months

[[ for kicks, I asked chatGPT to rewrite my tweet in MAVERICK style. very useful in truly bringing home the message of how obnoxious this response style truly is!!! 💅💅 ]]

1

0

5

Gal Yona

@_galyo

5 months

tbc, I’m not saying the benchmark is useless. If you're optimizing purely for likability, it’s probably useful to know that the average user enjoys this kind of overly enthusiastic fluff. but it can't be taken seriously as a measure of utility for general-purpose LLMs.

1

0

1

Gal Yona

@_galyo

5 months

with the battles being public, it’s now glaringly obvious that completely unfactual responses can easily win, so long as they’re delivered in an aggressively upbeat tone and cheerfully long-winded style.

1

0

2

Gal Yona

@_galyo

5 months

my completely personal take: Llama-4 blatantly gaming the Chatbot Arena evals (beyond being a neat example of Goodhart’s law in action!) is an important moment for the NLP community ⏩.

vik

@vikhyatk

5 months

This is the clearest evidence that no one should take these rankings seriously. In this example it's super yappy and factually inaccurate, and yet the user voted for Llama 4. The rest aren't any better.

1

8

Gal Yona

@_galyo

5 months

RT @stanfordnlp: .@percyliang & @tatsu_hashimoto start the 2nd offering of CS336 Language Modeling from Scratch at @stanfordnlp. The class….

0

32

0

Gal Yona

@_galyo

5 months

RT @zorikgekhman: 🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how….

0

60

0

Gal Yona

@_galyo

5 months

RT @srush_nlp: Simons Institute Workshop: "Future of LLMs and Transformers": 21 talks Monday - Friday next week. h….

0

94

0

Gal Yona

@_galyo

7 months

For practitioners, CISC is a minimal change to SC that you should just try out 💪 For those interested in confidence/self verification, our work provides lots of interesting insights. Check out the paper and amir's thread for details!.

0

1

Gal Yona

@_galyo

7 months

Excited for this work to be out 😀 . Self consistency is great but v expensive (especially when you care about those last few acc points). We show: switching to a *weighted* majority vote (weights = confidence scores derived by the model itself) is way more sample efficient! 1/n.

Amir Taubenfeld

@TaubenfeldAmir

7 months

New Preprint 🎉. LLM self-assessment unlocks efficient decoding ✅. Our Confidence-Informed Self-Consistency (CISC) method cuts compute without losing accuracy. We also rethink confidence evaluation & contribute to the debate on self-verification. 1/8👇

1

10

Gal Yona

@_galyo

8 months

RT @rm_rafailov: We have a new position paper on "inference time compute" and what we have been working on in the last few months! We prese….

0

227

0