Jonathan Berant Profile
Jonathan Berant

@JonathanBerant

Followers
3K
Following
3K
Media
34
Statuses
1K

NLP at Tel-Aviv University and Google

Joined June 2011
Don't wanna be here? Send us removal request.
@ben_bogin
Ben Bogin
11 days
My team @GoogleAI is looking for a 2026 research intern in Mountain View! I will be hiring for a project aimed at improving tool-using and search agents via RL training and data generation. To apply: https://t.co/THcg5LGEF5 + feel free to ping me!
7
26
278
@AmouyalSamuel
Samuel AMOUYAL
2 months
I had a lot of fun working on this with @JonathanBerant @aya_meltzer You can find our paper here: https://t.co/Wx6xMzOJYo And by the way, the answer (at least based on the sentence) is yes, you can ignore head injuries. But it's a terrible advice
Tweet card summary image
arxiv.org
Large language models (LLMs) that fluently converse with humans are a reality - but do LLMs experience human-like processing difficulties? We systematically compare human and LLM sentence...
0
2
5
@AmouyalSamuel
Samuel AMOUYAL
2 months
We have more interesting insights in our paper. We believe this is a really exciting direction for humans and LLMs comparison. Extending our framework to more structures and more LLMs will certainly lead to additional insights !
1
1
2
@AmouyalSamuel
Samuel AMOUYAL
2 months
We report 3 additional findings: 1. LLMs similarity to humans on GP structures is higher 2. The similarity of the structures' difficulty ordering to humans increases with model size 3. LLM performs better on easy baseline than on the structures if it's not too strong or too weak
1
1
1
@AmouyalSamuel
Samuel AMOUYAL
2 months
First, these structures are challenging for LLMs (highest mean accuracy being 0.653). We noticed 2 interesting facts: 1. Structures straining working memory in humans were easier than structures challenging due to ambiguity. 2. Thinking helps, but once an LLM is strong enough.
1
1
1
@AmouyalSamuel
Samuel AMOUYAL
2 months
This is what we check. We tested the comprehension of 31 different models from 5 different families on 7 different challenging structures (including 4 types of garden paths, GP). We also collected human data on these structures to be able to compare human comprehension to LLMs.
1
2
1
@AmouyalSamuel
Samuel AMOUYAL
2 months
But this is not the only challenging structure for humans. Psycholinguistic research has discovered many different structures that are challenging for humans. We read them slowly and understand them poorly. But what happens with LLMs? Do they understand them correctly ?
1
1
1
@AmouyalSamuel
Samuel AMOUYAL
2 months
No head injury is too trivial to be ignored. What do you think this sentence means ? Can you ignore head injuries ? This type of sentence is called depth charge sentence and its structure is especially challenging for humans.
1
3
6
@TmlrOrg
Transactions on Machine Learning Research
2 months
As Transactions on Machine Learning Research (TMLR) grows in number of submissions, we are looking for more reviewers and action editors. Please sign up! Only one paper to review at a time and <= 6 per year, reviewers report greater satisfaction than reviewing for conferences!
2
28
67
@redpony
Chris Dyer
2 months
Great work, everyone! :)
@COLM_conf
Conference on Language Modeling
2 months
Outstanding paper 3๐Ÿ†: Don't lie to your friends: Learning what you know from collaborative self-play https://t.co/hvY1oaF6Jf
1
1
15
@COLM_conf
Conference on Language Modeling
2 months
Outstanding paper 3๐Ÿ†: Don't lie to your friends: Learning what you know from collaborative self-play https://t.co/hvY1oaF6Jf
1
11
40
@JonathanBerant
Jonathan Berant
2 months
Will be at @COLM_conf Mon night to Fri morning, let me know if you wanna catch up!
1
0
22
@MichalFeldman9
Michal Feldman
4 months
๐Ÿšจ Don't miss this amazing opportunity! The Schmidt Postdoc Award supports Israeli women pursuing postdocs abroad in math, CS, IE, or EE. ๐Ÿ’ฐ $60K/year | ๐ŸŒ Top global institutions ๐Ÿ“… Deadline: Aug 15, 2025 ๐Ÿ”— https://t.co/AGYF6S6dtf ๐Ÿ“ Apply:
Tweet card summary image
schmidtsciences.org
0
7
8
@MichalFeldman9
Michal Feldman
4 months
๐Ÿšจ ืืœ ืชืคืกืคืกื™ ืืช ื”ื”ื–ื“ืžื ื•ืช: ืžืœื’ืช ืคื•ืกื˜ื“ื•ืง ื‘ื—ื•"ืœ ืข"ืฉ ืฉืžื™ื“ื˜, ืœื ืฉื™ื ื‘ืžืชืžื˜ื™ืงื”, ืžื“ืขื™ ื”ืžื—ืฉื‘, ื”ื ื“ืกืช ืชืขืฉื™ื™ื” ื•ื ื™ื”ื•ืœ ืื• ื”ื ื“ืกืช ื—ืฉืžืœ. ๐Ÿ’ฐ 60,000 ื“ื•ืœืจ ื‘ืฉื ื” ๐Ÿ“… ื“ื“ืœื™ื™ืŸ: 15 ื‘ืื•ื’ื•ืกื˜ 2025 ๐Ÿ”— https://t.co/AGYF6S6dtf ๐Ÿ“ ื”ื’ืฉื”:
Tweet card summary image
schmidtsciences.org
0
5
22
@SZiteng
Ziteng Sun
5 months
[Today 11 am poster E-2804 #ICML2025] Inference-time compute have been instrumental to recent development of LLMs. Can we align our model to better suit a given inference-time procedure? Come check our poster and discuss with @ananthbshankar, @abeirami, @jacobeisenstein, and
1
4
14
@JonathanBerant
Jonathan Berant
5 months
Accepted to COLM @COLM_conf !
@JonathanBerant
Jonathan Berant
9 months
Hi ho! New work: https://t.co/QMPes1MZ2N With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fantinehuot @mlapata @vicky_zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3
0
1
18
@ml_angelopoulos
Anastasios Nikolas Angelopoulos
6 months
This paper extends active statistical inference in a number of exciting ways, with applications in LLM evaluation! 1. Improves upon active inference to give the optimal sampling policy with clipping. 2. Gives an optimal-cost inference procedure Take a look! One of my fave
@adamjfisch
Adam Fisch
6 months
You need to evaluate an AI system and you have three things: 1. A cheap judge, which is noisy. ๐Ÿ™ˆ 2. An expensive judge, which is accurate. ๐Ÿง‘โ€โš–๏ธ 3. A budget ๐Ÿ’ธ How should you spend the budget to get the best possible estimate of model quality? https://t.co/N7JDiHO3wI
1
3
32
@adamjfisch
Adam Fisch
6 months
Work co-led with @ml_angelopoulos , whom we had the pleasure of briefly hosting here at @GoogleDeepMind for this collaboration, together with my GDM and GR colleagues @jacobeisenstein , @JonathanBerant , and Alekh Agarwal.
2
1
3
@adamjfisch
Adam Fisch
6 months
We explore how much these policies improve over the naรฏve empirical estimates of E[H] using synthetic + real data. The optimal pi depends on unknown distributional properties of (X, H, G), so we examine performance in theory (using oracle rules) + in practice (when approximated).
1
2
4
@adamjfisch
Adam Fisch
6 months
We solve for two types of policies: (1) the best fixed sampling rate, pi_random(x) = p*, that doesnโ€™t change with X, and (2) and the best fully active policy pi_active(x) \in (0, 1]. Intuitively, fully active is better when G has variable accuracy (e.g., we see hard + easy Xs).
1
1
3