Shashwat Goel @ShashwatGoel7 X Profile

Shashwat Goel

@ShashwatGoel7

Followers

2K

Following

7K

Media

171

Statuses

1K

Scaling supervision for AI on evals that matter. 👨‍🍳Forecasting, Long Horizon, Synth Data for RL @ELLISInst_Tue @MPI_IS Prev: @AIatMeta, MATS, Quant

https://t.co/aZsWKm82N0

Tübingen, Germany

Joined June 2020

Don't wanna be here? Send us removal request.

Shashwat Goel

@ShashwatGoel7

20 days

New blogpost: Why I think automated research is the means, not just the end, for training superintelligent AI systems. In pointing models at scientific discovery, we will have to achieve the capabilities today's LLMs lack: - long-horizon palnning - continual adaptation -

3

4

63

Shashwat Goel

@ShashwatGoel7

2 days

An interesting research problem that might be solvable at big labs / @thinkymachines / @wandb With enough data on training runs, can we make universal recommendations of good hyperparams, using stats of dataset, lossfn, activations, size etc Would save so much time, compute

12

3

162

SEA Workshop

@SEAWorkshop

3 days

The best poster awards go to: 1. Go-Browse: Training Web Agents with Structured Exploration Apurva Gandhi, Graham Neubig 2. Scaling Open-Ended Reasoning to Predict the Future Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping 🎉Congrats!

2

8

18

Nikhil Chandak

@nikhilchandak29

3 days

Come at the Scaling Environments for Agents (SEA) workshop at NeurIPS today for a preview of our upcoming work presented by @jonasgeiping on Open-Ended Forecasting of World Events where we show how to go beyond prediction markets to scale forecasting training data for LLMs!

Shashwat Goel

@ShashwatGoel7

4 days

🚨Checkout an exclusive preview of our soon-to-be-released project on Training LLMs for open-ended forecasting, with RL on synthetic data ;) Tomorrow at the NeurIPS Scaling Environments for Agents (SEA) workshop (12:20-13:20, 15:50-16:50, Upper Level Room 23ABC, @jonasgeiping)

1

2

16

Shashwat Goel

@ShashwatGoel7

3 days

This workshop is the one part of not going to NeurIPS that's causing me the most FOMO. If you believe in the second half of AI, and era of experience, the papers being presented here are some of the most important. Kudos to @guohao_li and team for organizing!

SEA Workshop

@SEAWorkshop

4 days

🚀 SEA Workshop is going LIVE TOMORROW! Join us at NeurIPS 2025 for a full day diving into the Scaling Environments for Agents featuring an incredible lineup of speakers and panelists： @egrefen @Mike_A_Merrill @mialon_gregoire @deepaknathani11 @jl_marino @syz0x1 @qhwang3

1

5

40

Shashwat Goel

@ShashwatGoel7

3 days

You can wait for the academic compute crisis to get solved 🥱 or... Just come to Tübingen for the: 1) Compute 2) Talent Density 3) Aesthetics, in research, workspaces, and the city as a whole (Job) Markets are not efficient ;) @FrancoisChauba1's plot extended by

3

63

Jonas Geiping @ Neurips

@jonasgeiping

4 days

Happening now!

Arvindh Arun

@arvindh__a

9 days

If you’re attending #NeurIPS2025 in San Diego 🇺🇸, check out @jonasgeiping presenting our work at the @mti_neurips workshop on Dec 6th!

1

6

59

Shashwat Goel

@ShashwatGoel7

4 days

🚨Checkout an exclusive preview of our soon-to-be-released project on Training LLMs for open-ended forecasting, with RL on synthetic data ;) Tomorrow at the NeurIPS Scaling Environments for Agents (SEA) workshop (12:20-13:20, 15:50-16:50, Upper Level Room 23ABC, @jonasgeiping)

2

10

53

Shashwat Goel

@ShashwatGoel7

4 days

There's a certain beauty to learning RL via RL I never really did a proper course on RL. But this year, I spent a lot of time thinking about, and applying RL on LLMs. Some RL folks told me I should just watch an intro course. But I'm glad I didn't! That would have been

3

1

54

Shashwat Goel

@ShashwatGoel7

5 days

Why is "research" over-indexed on novelty? To hammer a nail, how does it matter whether I used a good ol hammer, or one with a pink-panda-riding-a-dragon design? What should matter is hammering important nails, not designing fancier hammers.

4

0

10

Shashwat Goel

@ShashwatGoel7

6 days

Was wondering whether GRPO style RL is "only a few tokens deep"... Intuitively, we take a next token predictor, and slightly upweigh some tokens, s.t. NTP leads to success Found this interesting ICLR sub preliminarily indicating this hypothesis is true: https://t.co/28Gar1xNhJ

2

9

95

Shashwat Goel

@ShashwatGoel7

8 days

Followup thought, like "write for LLMs" - @gwern, What does this imply for building new technology? If there's not enough open-source data for it, and we don't have "documentation-efficient" intelligence soon, is open-source use at scale a huge moat for incumbent tech?

0

Shashwat Goel

@ShashwatGoel7

8 days

How are LLMs trained to become better at the Jax "way of doing things*"? Presumably the best Jax code is internal to Google. But even google wouldn't want to train public facing models on internal code? *I mean what's not learnable by "translating" pytorch code, or pure RL...

2

0

4

Dan Hendrycks

@hendrycks

9 days

I've been saying mechanistic interpretability is misguided from the start. Glad people are coming around many years later.

Neel Nanda

@NeelNanda5

9 days

The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit

12

18

374

Shashwat Goel

@ShashwatGoel7

12 days

"Human" performance is not a monolith: always ask which human "LLM" performance is not a monolith: always ask which LLM

Andy Ayrey

@AndyAyrey

13 days

y'all in the 'jagged frontier' debate are missing something:

0

6

Shashwat Goel

@ShashwatGoel7

13 days

The best thing about Gemini 3 code is there's no try-except style slop. The code it produces is concise, and correct. Can just hit accept without worrying as much about bloat.

0

18

Dulhan Jayalath

@DulhanJay

13 days

Want to understand how to RL fine-tune your LLM without labels? I'll be presenting Compute as Teacher (CaT 🐈) as a spotlight⭐️ poster at the Efficient Reasoning workshop at NeurIPS ✈️ next week If you're around, come and chat about RL, LLMs, and brain decoding. #NeurIPS2025

Dulhan Jayalath

@DulhanJay

3 months

🚨New Meta Superintelligence Labs Paper🚨 What do we do when we don’t have reference answers for RL? What if annotations are too expensive or unknown? Compute as Teacher (CaT🐈) turns inference compute into a post-training supervision signal. CaT improves up to 30% even on

2

13

75

Rulin Shao

@RulinShao

14 days

https://t.co/G5D4U5vCEb

0

15

108

Shashwat Goel

@ShashwatGoel7

16 days

There's no shortcut to removing shortcuts in Deep Learning.

Vishaal Udandarao

@vishaal_urao

16 days

🚀 New paper! https://t.co/qWGZ4LCAr1 Recently, Cambrian-S released models & two benchmarks (VSR & VSC) for “spatial supersensing” in video! We found: 1️⃣ Simple no-frame baseline (NoSense) ~perfectly solves VSR! 2️⃣ Tiny sanity check collapses Cambrian-S perf to 0% on VSC! 🧵👇

0

2

18

Shashwat Goel

@ShashwatGoel7

16 days

Thanks to everyone who took out time to leave some advice! There's no single answer, but lots of good principles. So I've copied the question and thread link to a substack post. Hopefully this makes it easier to find for people in the future :) https://t.co/jVvbYeSCNv

Shashwat Goel

@ShashwatGoel7

17 days

How do PhD students / researchers manage the sinking feeling of having a growing bucketlist (25+ rn) of interesting ideas/directions but not enough time to try any of them? How do you select when multiple seem promising?

0

3

20

Shashwat Goel

@ShashwatGoel7

17 days

The most impactful thing I took away from a podcast this year might just be @sama's reco for this playlist: https://t.co/VJzOH7Nk20 Massive quality of life/work upgrade, esp on the worst (or most hard to focus) days.

open.spotify.com

Max Richter · album · 2014 · 18 songs

0

2