Reuben Narad @ReubenNarad X Profile

Reuben Narad

@ReubenNarad

Followers

13

Following

162

Media

4

Statuses

13

Joined October 2023

Don't wanna be here? Send us removal request.

Reuben Narad

@ReubenNarad

13 days

!!!.

lalit

@stochasticlalit

13 days

It was amazing to be part of this effort. Huge shout out to the team, and all the incredible pre-training and post-training efforts that ensure Gemini is the leading frontier model!.

0

1

Reuben Narad

@ReubenNarad

23 days

Code + paper at.👉 check it out!.

0

1

Reuben Narad

@ReubenNarad

23 days

AI benchmarks have mostly focused on math, science, and coding. If we’re gonna get to super intelligence, we’ll need more diverse evals! Many thanks to my amazing coauthors @jifan_zhang, @stochasticLalit, @Jiayi7960, @siddsuresh97, @BrickenPine, @bob_mankoff, and @rdnowak.

1

0

3

Reuben Narad

@ReubenNarad

23 days

To test if HumorBench taps into reasoning, we let models use extra reasoning tokens. Results were surprisingly mixed: Qwen improved steadily, OpenAI o3 showed modest gains, but Claude actually performed worse with more tokens. Reasoning here isn’t simply “more is better”…

1

0

Reuben Narad

@ReubenNarad

23 days

Interestingly, HumorBench scores align most with ARC-AGI-1 (ρ = 0.94), which is designed purely for abstract reasoning. They’re also solidly correlated with GPQA-Diamond (ρ≈0.74) and LM-Arena ELO (ρ≈0.71).

1

0

Reuben Narad

@ReubenNarad

23 days

HumorBench includes over 300 winning cartoons, with explanation rubrics written in collaboration with the legendary Bob Mankoff, who started the Caption Contest. Each rubric splits the joke into 1–3 clear “elements,” giving us an objective measure of how much the model gets it.

1

0

Reuben Narad

@ReubenNarad

23 days

Whoa. Grok 4 beats o3 on our never-released benchmark: HumorBench, a non-STEM reasoning benchmark that measures humor comprehension. The task is simple: given a New Yorker Caption Contest cartoon and caption, explain the joke.

2

5

11

Reuben Narad

@ReubenNarad

9 months

- “In an airplane you trust the door isn’t going to fall off— not a great time to use that example”.- Being a programmer will be more important, not less, in the future. - Most worrying thing to him: The global supply chain is very fragile (shoutout OM).

0

1

Reuben Narad

@ReubenNarad

9 months

Sam A x UW highlights: (1/2).-old ChatGPT “clearly” had a liberal bias— is this new?.- UW’s Fusion research is "great".- We want people to always be confused how we can do it so cheaply.-We were right that AGI was gonna be possible— (Optimistic, sam).-HE DID A DIG ON BOEING:.

1

0

3