
Gal Yona
@_galyo
Followers
492
Following
1K
Media
31
Statuses
294
Research scientist @googleai, previously CS PhD @weizmannscience
Joined October 2009
really wish more talks in CS/ML were like this (surely seminars, maybe confs?). It's quite obv that over a uniform sample of accepted Neurips papers, Pr[results will be insightful] << Pr[hearing about the "behind the scenes" of the project will be insightful] (for me personally).
Imagine going to a seminar and listening to the speaker also talk about how the big idea happened. Join us Sept. 22 for the first talk in the "Night Science Seminar Series" where I'll talk about cellular plasticity and also discuss how the idea came about!.
0
0
5
+100 for this (surprisingly short) take! "writing the paper" is not something that happens at the END of a research project. it's an integral part of it. personally, blindly offloading that part to an LLM would be the surest way to hurt the quality of my research.
"writing is not only about reporting results; it also provides a tool to uncover new thoughts and ideas. Writing compels us to think"
0
1
10
new work by @pybeebee shows that LLMs still struggle to faithfully express their uncertainty in words, but cool to see that meta cognitive inspired prompting can go a long way. looking forward to seeing more positive results on this fundamental problem!.
๐ฅ Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs๐ฅ. How can we make LLMs talk about uncertainty in a way that truly reflects what they internally "know"?.Check out our new preprint to find out!.Details in ๐งต(1/n):
0
1
2
RT @JoshBreiner: ืืฆื ืืืฉืืจื: ืืฉืชืืฉื ืืฆ'ื GPT ืฉืืืฆืื ืขืืืจื ืืืง ืืืฉ ืขื ืื ืช ืื ืฆื ืืืืื ืืืืจืืช ืคืืืคืื ืืืืื ืืืืช ืืฉืคื ืืฉืืื ืืืืจื. ืืฉืืคื ืืื ืืโฆ.
0
188
0
RT @doodlestein: @sama the single biggest thing you could do for safety/alignment is to put a massive emphasis in the RL feedback loop on bโฆ.
0
4
0
This was a great 30-minute conceptual read. It neatly ties together classic RL, LLMs of the past few years, and where agents are headed next. Honestly, I find the future of agents interacting w the world with less human mediation ("experiencing") both exciting and terrifying.
@dsivakumar The short paper "Welcome to the Era of Experience" is literally just released, like this week. Ultimately it will become a chapter in the book 'Designing an Intelligence' edited by George Konidaris and published by MIT Press.
0
0
3
[[ for kicks, I asked chatGPT to rewrite my tweet in MAVERICK style. very useful in truly bringing home the message of how obnoxious this response style truly is!!! ๐
๐
]]
1
0
5
tbc, Iโm not saying the benchmark is useless. If you're optimizing purely for likability, itโs probably useful to know that the average user enjoys this kind of overly enthusiastic fluff. but it can't be taken seriously as a measure of utility for general-purpose LLMs.
1
0
1
with the battles being public, itโs now glaringly obvious that completely unfactual responses can easily win, so long as theyโre delivered in an aggressively upbeat tone and cheerfully long-winded style.
1
0
2
my completely personal take: Llama-4 blatantly gaming the Chatbot Arena evals (beyond being a neat example of Goodhartโs law in action!) is an important moment for the NLP community โฉ.
This is the clearest evidence that no one should take these rankings seriously. In this example it's super yappy and factually inaccurate, and yet the user voted for Llama 4. The rest aren't any better.
1
1
8
RT @stanfordnlp: .@percyliang & @tatsu_hashimoto start the 2nd offering of CS336 Language Modeling from Scratch at @stanfordnlp. The classโฆ.
0
32
0
RT @zorikgekhman: ๐จ It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and howโฆ.
0
60
0
RT @srush_nlp: Simons Institute Workshop: "Future of LLMs and Transformers": 21 talks Monday - Friday next week. hโฆ.
0
94
0
For practitioners, CISC is a minimal change to SC that you should just try out ๐ช For those interested in confidence/self verification, our work provides lots of interesting insights. Check out the paper and amir's thread for details!.
0
0
1
Excited for this work to be out ๐ . Self consistency is great but v expensive (especially when you care about those last few acc points). We show: switching to a *weighted* majority vote (weights = confidence scores derived by the model itself) is way more sample efficient! 1/n.
New Preprint ๐. LLM self-assessment unlocks efficient decoding โ
. Our Confidence-Informed Self-Consistency (CISC) method cuts compute without losing accuracy. We also rethink confidence evaluation & contribute to the debate on self-verification. 1/8๐
1
1
10
RT @rm_rafailov: We have a new position paper on "inference time compute" and what we have been working on in the last few months! We preseโฆ.
0
227
0