Gal Yona Profile
Gal Yona

@_galyo

Followers
499
Following
1K
Media
32
Statuses
300

Research scientist @googleai, previously CS PhD @weizmannscience

Joined October 2009
Don't wanna be here? Send us removal request.
@_galyo
Gal Yona
3 days
Switched dinner tonight to the tiny table that looks just like the one at my kidโ€™s kindergarten. instantly got the "kindergarten" version of the dining experience: he was super independent & way more chill and well behaved. I feel like I finally get persona prompting for LLMs ๐Ÿ˜…
0
0
8
@_galyo
Gal Yona
15 days
unpopular opinion (?): text outputs (like the one below ๐Ÿคฏ) can excite, intrigue and move me in ways that no fancy Nano Banana generated image will ever be able to do. as modalities, text is just a gazillion times more interesting than image.
@Lari_island
Lari
16 days
Opus 4.5 >the building itself was an experience and the thing that was built KNOWS this
0
0
0
@JulianL093
julian
2 months
This is not a particularly good take and is indicative of a fundamental misunderstanding of what a top-tier technical college education is suppose to offer. Preparing to understand modern AI as a Harvard or Stanford undergrad is not about learning "prompt engineering", vibe
@zarazhangrui
Zara Zhang
2 months
Harvard and Stanford students tell me their professors don't understand AI and the courses are outdated. If elite schools can't keep up, the credential arms race is over. Self-learning is the only way now.
48
163
2K
@_galyo
Gal Yona
3 months
If you're working on factuality in LLMs, please check out our release of SimpleQA Verified โœ… - a new & improved benchmark for reliably measuring progress in short form factuality!
@lkshaas
Lukas Haas
3 months
We challenged ourselves to build the cleanest, highest-signal factuality benchmark out there. Today, we're releasing the result: SimpleQA Verified โœ…๐Ÿฅ‡ On this more reliable, 1,000-prompt eval, Gemini 2.5 Pro establishes a new SOTA, outperforming other frontier models. We're
0
0
1
@kaggle
Kaggle
3 months
๐Ÿš€ New Benchmark Launch: SimpleQA Verified! Weโ€™ve partnered with @GoogleDeepMind and @GoogleResearch to launch a curated 1,000-prompt benchmark designed to provide a more reliable and challenging evaluation of LLM short-form factuality. Check out the leaderboard here:
3
8
124
@_galyo
Gal Yona
4 months
really wish more talks in CS/ML were like this (surely seminars, maybe confs?). It's quite obv that over a uniform sample of accepted Neurips papers, Pr[results will be insightful] << Pr[hearing about the "behind the scenes" of the project will be insightful] (for me personally)
@ItaiYanai
Itai Yanai
4 months
Imagine going to a seminar and listening to the speaker also talk about how the big idea happened. Join us Sept. 22 for the first talk in the "Night Science Seminar Series" where I'll talk about cellular plasticity and also discuss how the idea came about! https://t.co/NL50oO9gsD
0
0
5
@NitCal
Nitay Calderon
4 months
๐Ÿฅณ๐Ÿฅณ Happy to share that we have three papers accepted to EMNLP 2025 ๐Ÿ‡จ๐Ÿ‡ณ (2 main, 1 findings)! What makes this special is that all three belong to a new research line I began last year: LLM-as-a-judge/LLM-as-an-annotator ๐Ÿค–๐Ÿง‘โ€โš–๏ธ
2
13
130
@_galyo
Gal Yona
5 months
+100 for this (surprisingly short) take! "writing the paper" is not something that happens at the END of a research project.. it's an integral part of it. personally, blindly offloading that part to an LLM would be the surest way to hurt the quality of my research.
0
1
10
@AFarfuri
Alex Farfuri
6 months
๐Ÿชป
81
191
5K
@sh_reya
Shreya Shankar
6 months
new blogpost on writing in the ~glorious~ age of LLMs
19
170
1K
@_galyo
Gal Yona
6 months
new work by @pybeebee shows that LLMs still struggle to faithfully express their uncertainty in words, but cool to see that meta cognitive inspired prompting can go a long way. looking forward to seeing more positive results on this fundamental problem!
@pybeebee
Gabrielle Kaili-May Liu
7 months
๐Ÿ”ฅ Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs๐Ÿ”ฅ How can we make LLMs talk about uncertainty in a way that truly reflects what they internally "know"? Check out our new preprint to find out! Details in ๐Ÿงต(1/n):
0
1
2
@JoshBreiner
Josh Breiner
7 months
ืžืฆื‘ ื”ืžืฉื˜ืจื”: ื”ืฉืชืžืฉื” ื‘ืฆ'ื˜ GPT ืฉื”ืžืฆื™ื ืขื‘ื•ืจื” ื—ื•ืง ื—ื“ืฉ ืขืœ ืžื ืช ืœื ืฆื— ื‘ื”ืœื™ืš ืœื”ื—ืจืžืช ืคืœืืคื•ืŸ ื‘ื“ื™ื•ืŸ ื‘ื‘ื™ืช ืžืฉืคื˜ ื”ืฉืœื•ื ื‘ื—ื“ืจื”. ื”ืฉื•ืคื˜ ื”ื™ื” ื”ืžื•ื ื›ืฉื”ื“ื‘ืจ ื”ืชื’ืœื”: "30 ืฉื ื” ืื ื™ ืฉื•ืคื˜ ื•ื—ืฉื‘ืชื™ ืฉืจืื™ืชื™ ื”ื›ืœ. ื›ื ืจืื” ืฉื˜ืขื™ืชื™"
101
182
2K
@yoavgo
(((ู„()(ู„() 'yoav))))๐Ÿ‘พ
7 months
we write too much. more than we can read, and many small incremental things. i think there should be some mechanism to restrict paper submissions and acceptances per person per year, to force people to prioritize their best work, and invest more in it.
@youjiaxuan
Jiaxuan You@NeurIPS
7 months
๐ŸคฏNeurIPS 2025 might break records as the most submitted-to academic conference ever. One of our submission IDs is already ~23,000 โ€” final count could hit 30,000. Absolute madness. #NeurIPS2025 #AI
28
29
610
@doodlestein
Jeffrey Emanuel
8 months
@sama the single biggest thing you could do for safety/alignment is to put a massive emphasis in the RL feedback loop on basic HONESTY and never misleading, tricking, overstating, exaggerating, etc. It should be like touching a hot stove to the model. Just like how you raise kids
9
4
170
@_galyo
Gal Yona
8 months
This was a great 30-minute conceptual read. It neatly ties together classic RL, LLMs of the past few years, and where agents are headed next. Honestly, I find the future of agents interacting w the world with less human mediation ("experiencing") both exciting and terrifying
@RichardSSutton
Richard Sutton
8 months
@dsivakumar The short paper "Welcome to the Era of Experience" is literally just released, like this week. Ultimately it will become a chapter in the book 'Designing an Intelligence' edited by George Konidaris and published by MIT Press. https://t.co/Y6m4jLRjnh
0
0
3
@_galyo
Gal Yona
8 months
[[ for kicks, I asked chatGPT to rewrite my tweet in MAVERICK style. very useful in truly bringing home the message of how obnoxious this response style truly is!!! ๐Ÿ’…๐Ÿ’… ]]
1
0
5
@_galyo
Gal Yona
8 months
tbc, Iโ€™m not saying the benchmark is useless. If you're optimizing purely for likability, itโ€™s probably useful to know that the average user enjoys this kind of overly enthusiastic fluff. but it can't be taken seriously as a measure of utility for general-purpose LLMs.
1
0
1
@_galyo
Gal Yona
8 months
with the battles being public, itโ€™s now glaringly obvious that completely unfactual responses can easily win, so long as theyโ€™re delivered in an aggressively upbeat tone and cheerfully long-winded style.
1
0
2
@_galyo
Gal Yona
8 months
my completely personal take: Llama-4 blatantly gaming the Chatbot Arena evals (beyond being a neat example of Goodhartโ€™s law in action!) is an important moment for the NLP community โฉ
@vikhyatk
vik
8 months
This is the clearest evidence that no one should take these rankings seriously. In this example it's super yappy and factually inaccurate, and yet the user voted for Llama 4. The rest aren't any better.
1
1
8
@stanfordnlp
Stanford NLP Group
9 months
.@percyliang & @tatsu_hashimoto start the 2nd offering of CS336 Language Modeling from Scratch at @stanfordnlp. The class philosophy is Understanding by Building. We need many people who understand the detailed design of modern LLMs, not just a few at โ€œfrontierโ€ ๐Ÿคญ AI companies.
9
33
242