Shashwat Goel Profile
Shashwat Goel

@ShashwatGoel7

Followers
2K
Following
7K
Media
171
Statuses
1K

Scaling supervision for AI on evals that matter. 👨‍🍳Forecasting, Long Horizon, Synth Data for RL @ELLISInst_Tue @MPI_IS Prev: @AIatMeta, MATS, Quant

Tübingen, Germany
Joined June 2020
Don't wanna be here? Send us removal request.
@ShashwatGoel7
Shashwat Goel
20 days
New blogpost: Why I think automated research is the means, not just the end, for training superintelligent AI systems. In pointing models at scientific discovery, we will have to achieve the capabilities today's LLMs lack: - long-horizon palnning - continual adaptation -
3
4
63
@ShashwatGoel7
Shashwat Goel
2 days
An interesting research problem that might be solvable at big labs / @thinkymachines / @wandb With enough data on training runs, can we make universal recommendations of good hyperparams, using stats of dataset, lossfn, activations, size etc Would save so much time, compute
12
3
162
@SEAWorkshop
SEA Workshop
3 days
The best poster awards go to: 1. Go-Browse: Training Web Agents with Structured Exploration Apurva Gandhi, Graham Neubig 2. Scaling Open-Ended Reasoning to Predict the Future Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping 🎉Congrats!
2
8
18
@nikhilchandak29
Nikhil Chandak
3 days
Come at the Scaling Environments for Agents (SEA) workshop at NeurIPS today for a preview of our upcoming work presented by @jonasgeiping on Open-Ended Forecasting of World Events where we show how to go beyond prediction markets to scale forecasting training data for LLMs!
@ShashwatGoel7
Shashwat Goel
4 days
🚨Checkout an exclusive preview of our soon-to-be-released project on Training LLMs for open-ended forecasting, with RL on synthetic data ;) Tomorrow at the NeurIPS Scaling Environments for Agents (SEA) workshop (12:20-13:20, 15:50-16:50, Upper Level Room 23ABC, @jonasgeiping)
1
2
16
@ShashwatGoel7
Shashwat Goel
3 days
This workshop is the one part of not going to NeurIPS that's causing me the most FOMO. If you believe in the second half of AI, and era of experience, the papers being presented here are some of the most important. Kudos to @guohao_li and team for organizing!
@SEAWorkshop
SEA Workshop
4 days
🚀 SEA Workshop is going LIVE TOMORROW! Join us at NeurIPS 2025 for a full day diving into the Scaling Environments for Agents featuring an incredible lineup of speakers and panelists: @egrefen @Mike_A_Merrill @mialon_gregoire @deepaknathani11 @jl_marino @syz0x1 @qhwang3
1
5
40
@ShashwatGoel7
Shashwat Goel
3 days
You can wait for the academic compute crisis to get solved 🥱 or... Just come to Tübingen for the: 1) Compute 2) Talent Density 3) Aesthetics, in research, workspaces, and the city as a whole (Job) Markets are not efficient ;) @FrancoisChauba1's plot extended by
3
3
63
@jonasgeiping
Jonas Geiping @ Neurips
4 days
Happening now!
@arvindh__a
Arvindh Arun
9 days
If you’re attending #NeurIPS2025 in San Diego 🇺🇸, check out @jonasgeiping presenting our work at the @mti_neurips workshop on Dec 6th!
1
6
59
@ShashwatGoel7
Shashwat Goel
4 days
🚨Checkout an exclusive preview of our soon-to-be-released project on Training LLMs for open-ended forecasting, with RL on synthetic data ;) Tomorrow at the NeurIPS Scaling Environments for Agents (SEA) workshop (12:20-13:20, 15:50-16:50, Upper Level Room 23ABC, @jonasgeiping)
2
10
53
@ShashwatGoel7
Shashwat Goel
4 days
There's a certain beauty to learning RL via RL I never really did a proper course on RL. But this year, I spent a lot of time thinking about, and applying RL on LLMs. Some RL folks told me I should just watch an intro course. But I'm glad I didn't! That would have been
3
1
54
@ShashwatGoel7
Shashwat Goel
5 days
Why is "research" over-indexed on novelty? To hammer a nail, how does it matter whether I used a good ol hammer, or one with a pink-panda-riding-a-dragon design? What should matter is hammering important nails, not designing fancier hammers.
4
0
10
@ShashwatGoel7
Shashwat Goel
6 days
Was wondering whether GRPO style RL is "only a few tokens deep"... Intuitively, we take a next token predictor, and slightly upweigh some tokens, s.t. NTP leads to success Found this interesting ICLR sub preliminarily indicating this hypothesis is true: https://t.co/28Gar1xNhJ
2
9
95
@ShashwatGoel7
Shashwat Goel
8 days
Followup thought, like "write for LLMs" - @gwern, What does this imply for building new technology? If there's not enough open-source data for it, and we don't have "documentation-efficient" intelligence soon, is open-source use at scale a huge moat for incumbent tech?
0
0
0
@ShashwatGoel7
Shashwat Goel
8 days
How are LLMs trained to become better at the Jax "way of doing things*"? Presumably the best Jax code is internal to Google. But even google wouldn't want to train public facing models on internal code? *I mean what's not learnable by "translating" pytorch code, or pure RL...
2
0
4
@hendrycks
Dan Hendrycks
9 days
I've been saying mechanistic interpretability is misguided from the start. Glad people are coming around many years later.
@NeelNanda5
Neel Nanda
9 days
The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit
12
18
374
@ShashwatGoel7
Shashwat Goel
12 days
"Human" performance is not a monolith: always ask which human "LLM" performance is not a monolith: always ask which LLM
@AndyAyrey
Andy Ayrey
13 days
y'all in the 'jagged frontier' debate are missing something:
0
0
6
@ShashwatGoel7
Shashwat Goel
13 days
The best thing about Gemini 3 code is there's no try-except style slop. The code it produces is concise, and correct. Can just hit accept without worrying as much about bloat.
0
0
18
@DulhanJay
Dulhan Jayalath
13 days
Want to understand how to RL fine-tune your LLM without labels? I'll be presenting Compute as Teacher (CaT 🐈) as a spotlight⭐️ poster at the Efficient Reasoning workshop at NeurIPS ✈️ next week If you're around, come and chat about RL, LLMs, and brain decoding. #NeurIPS2025
@DulhanJay
Dulhan Jayalath
3 months
🚨New Meta Superintelligence Labs Paper🚨 What do we do when we don’t have reference answers for RL? What if annotations are too expensive or unknown? Compute as Teacher (CaT🐈) turns inference compute into a post-training supervision signal. CaT improves up to 30% even on
2
13
75
@RulinShao
Rulin Shao
14 days
0
15
108
@ShashwatGoel7
Shashwat Goel
16 days
There's no shortcut to removing shortcuts in Deep Learning.
@vishaal_urao
Vishaal Udandarao
16 days
🚀 New paper! https://t.co/qWGZ4LCAr1 Recently, Cambrian-S released models & two benchmarks (VSR & VSC) for “spatial supersensing” in video! We found: 1️⃣ Simple no-frame baseline (NoSense) ~perfectly solves VSR! 2️⃣ Tiny sanity check collapses Cambrian-S perf to 0% on VSC! 🧵👇
0
2
18
@ShashwatGoel7
Shashwat Goel
16 days
Thanks to everyone who took out time to leave some advice! There's no single answer, but lots of good principles. So I've copied the question and thread link to a substack post. Hopefully this makes it easier to find for people in the future :) https://t.co/jVvbYeSCNv
@ShashwatGoel7
Shashwat Goel
17 days
How do PhD students / researchers manage the sinking feeling of having a growing bucketlist (25+ rn) of interesting ideas/directions but not enough time to try any of them? How do you select when multiple seem promising?
0
3
20
@ShashwatGoel7
Shashwat Goel
17 days
The most impactful thing I took away from a podcast this year might just be @sama's reco for this playlist: https://t.co/VJzOH7Nk20 Massive quality of life/work upgrade, esp on the worst (or most hard to focus) days.
Tweet card summary image
open.spotify.com
Max Richter · album · 2014 · 18 songs
0
0
2