Reuben Narad Profile
Reuben Narad

@ReubenNarad

Followers
13
Following
162
Media
4
Statuses
13

Joined October 2023
Don't wanna be here? Send us removal request.
@ReubenNarad
Reuben Narad
13 days
!!!.
@stochasticlalit
lalit
13 days
It was amazing to be part of this effort. Huge shout out to the team, and all the incredible pre-training and post-training efforts that ensure Gemini is the leading frontier model!.
0
0
1
@ReubenNarad
Reuben Narad
23 days
Code + paper at.👉 check it out!.
0
0
1
@ReubenNarad
Reuben Narad
23 days
AI benchmarks have mostly focused on math, science, and coding. If we’re gonna get to super intelligence, we’ll need more diverse evals! Many thanks to my amazing coauthors @jifan_zhang, @stochasticLalit, @Jiayi7960, @siddsuresh97, @BrickenPine, @bob_mankoff, and @rdnowak.
1
0
3
@ReubenNarad
Reuben Narad
23 days
To test if HumorBench taps into reasoning, we let models use extra reasoning tokens. Results were surprisingly mixed: Qwen improved steadily, OpenAI o3 showed modest gains, but Claude actually performed worse with more tokens. Reasoning here isn’t simply “more is better”…
Tweet media one
1
0
0
@ReubenNarad
Reuben Narad
23 days
Interestingly, HumorBench scores align most with ARC-AGI-1 (ρ = 0.94), which is designed purely for abstract reasoning. They’re also solidly correlated with GPQA-Diamond (ρ≈0.74) and LM-Arena ELO (ρ≈0.71).
Tweet media one
1
0
0
@ReubenNarad
Reuben Narad
23 days
HumorBench includes over 300 winning cartoons, with explanation rubrics written in collaboration with the legendary Bob Mankoff, who started the Caption Contest. Each rubric splits the joke into 1–3 clear “elements,” giving us an objective measure of how much the model gets it.
Tweet media one
1
0
0
@ReubenNarad
Reuben Narad
23 days
Whoa. Grok 4 beats o3 on our never-released benchmark: HumorBench, a non-STEM reasoning benchmark that measures humor comprehension. The task is simple: given a New Yorker Caption Contest cartoon and caption, explain the joke.
Tweet media one
2
5
11
@ReubenNarad
Reuben Narad
9 months
- “In an airplane you trust the door isn’t going to fall off— not a great time to use that example”.- Being a programmer will be more important, not less, in the future. - Most worrying thing to him: The global supply chain is very fragile (shoutout OM).
0
1
1
@ReubenNarad
Reuben Narad
9 months
Sam A x UW highlights: (1/2).-old ChatGPT “clearly” had a liberal bias— is this new?.- UW’s Fusion research is "great".- We want people to always be confused how we can do it so cheaply.-We were right that AGI was gonna be possible— (Optimistic, sam).-HE DID A DIG ON BOEING:.
1
0
3