Jonathan Berant
@JonathanBerant
Followers
3K
Following
3K
Media
34
Statuses
1K
NLP at Tel-Aviv University and Google
Joined June 2011
My team @GoogleAI is looking for a 2026 research intern in Mountain View! I will be hiring for a project aimed at improving tool-using and search agents via RL training and data generation. To apply: https://t.co/THcg5LGEF5 + feel free to ping me!
7
26
278
I had a lot of fun working on this with @JonathanBerant @aya_meltzer You can find our paper here: https://t.co/Wx6xMzOJYo And by the way, the answer (at least based on the sentence) is yes, you can ignore head injuries. But it's a terrible advice
arxiv.org
Large language models (LLMs) that fluently converse with humans are a reality - but do LLMs experience human-like processing difficulties? We systematically compare human and LLM sentence...
0
2
5
We have more interesting insights in our paper. We believe this is a really exciting direction for humans and LLMs comparison. Extending our framework to more structures and more LLMs will certainly lead to additional insights !
1
1
2
We report 3 additional findings: 1. LLMs similarity to humans on GP structures is higher 2. The similarity of the structures' difficulty ordering to humans increases with model size 3. LLM performs better on easy baseline than on the structures if it's not too strong or too weak
1
1
1
First, these structures are challenging for LLMs (highest mean accuracy being 0.653). We noticed 2 interesting facts: 1. Structures straining working memory in humans were easier than structures challenging due to ambiguity. 2. Thinking helps, but once an LLM is strong enough.
1
1
1
This is what we check. We tested the comprehension of 31 different models from 5 different families on 7 different challenging structures (including 4 types of garden paths, GP). We also collected human data on these structures to be able to compare human comprehension to LLMs.
1
2
1
But this is not the only challenging structure for humans. Psycholinguistic research has discovered many different structures that are challenging for humans. We read them slowly and understand them poorly. But what happens with LLMs? Do they understand them correctly ?
1
1
1
No head injury is too trivial to be ignored. What do you think this sentence means ? Can you ignore head injuries ? This type of sentence is called depth charge sentence and its structure is especially challenging for humans.
1
3
6
As Transactions on Machine Learning Research (TMLR) grows in number of submissions, we are looking for more reviewers and action editors. Please sign up! Only one paper to review at a time and <= 6 per year, reviewers report greater satisfaction than reviewing for conferences!
2
28
67
Great work, everyone! :)
Outstanding paper 3๐: Don't lie to your friends: Learning what you know from collaborative self-play https://t.co/hvY1oaF6Jf
1
1
15
Outstanding paper 3๐: Don't lie to your friends: Learning what you know from collaborative self-play https://t.co/hvY1oaF6Jf
1
11
40
Will be at @COLM_conf Mon night to Fri morning, let me know if you wanna catch up!
1
0
22
๐จ Don't miss this amazing opportunity! The Schmidt Postdoc Award supports Israeli women pursuing postdocs abroad in math, CS, IE, or EE. ๐ฐ $60K/year | ๐ Top global institutions ๐
Deadline: Aug 15, 2025 ๐ https://t.co/AGYF6S6dtf ๐ Apply:
schmidtsciences.org
0
7
8
๐จ ืื ืชืคืกืคืกื ืืช ืืืืืื ืืช: ืืืืช ืคืืกืืืืง ืืื"ื ืข"ืฉ ืฉืืืื, ืื ืฉืื ืืืชืืืืงื, ืืืขื ืืืืฉื, ืื ืืกืช ืชืขืฉืืื ืื ืืืื ืื ืื ืืกืช ืืฉืื. ๐ฐ 60,000 ืืืืจ ืืฉื ื ๐
ืืืืืื: 15 ืืืืืืกื 2025 ๐ https://t.co/AGYF6S6dtf ๐ ืืืฉื:
schmidtsciences.org
0
5
22
[Today 11 am poster E-2804 #ICML2025] Inference-time compute have been instrumental to recent development of LLMs. Can we align our model to better suit a given inference-time procedure? Come check our poster and discuss with @ananthbshankar, @abeirami, @jacobeisenstein, and
1
4
14
Accepted to COLM @COLM_conf !
Hi ho! New work: https://t.co/QMPes1MZ2N With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fantinehuot @mlapata @vicky_zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3
0
1
18
This paper extends active statistical inference in a number of exciting ways, with applications in LLM evaluation! 1. Improves upon active inference to give the optimal sampling policy with clipping. 2. Gives an optimal-cost inference procedure Take a look! One of my fave
You need to evaluate an AI system and you have three things: 1. A cheap judge, which is noisy. ๐ 2. An expensive judge, which is accurate. ๐งโโ๏ธ 3. A budget ๐ธ How should you spend the budget to get the best possible estimate of model quality? https://t.co/N7JDiHO3wI
1
3
32
Work co-led with @ml_angelopoulos , whom we had the pleasure of briefly hosting here at @GoogleDeepMind for this collaboration, together with my GDM and GR colleagues @jacobeisenstein , @JonathanBerant , and Alekh Agarwal.
2
1
3
We explore how much these policies improve over the naรฏve empirical estimates of E[H] using synthetic + real data. The optimal pi depends on unknown distributional properties of (X, H, G), so we examine performance in theory (using oracle rules) + in practice (when approximated).
1
2
4
We solve for two types of policies: (1) the best fixed sampling rate, pi_random(x) = p*, that doesnโt change with X, and (2) and the best fully active policy pi_active(x) \in (0, 1]. Intuitively, fully active is better when G has variable accuracy (e.g., we see hard + easy Xs).
1
1
3