Creston Brooks
@crestonbrooks
Followers
430
Following
2K
Media
11
Statuses
122
building ~ previously @SentientAGI @princeton_nlp ~ all 📠no 🖨️
Joined November 2024
Are you worried that an LLM you trained could be stolen and misused by mysterious masked men 🥷? Our work (now a #NeurIPS2025 Spotlight 💫) can help you detect such unauthorized use. As a side-quest, we also analyse memorization and forgetting in LLMs 🧵(1/11).
10
3
31
Language models that think, chat better. We used longCoT (w/ reward model) for RLHF instead of math, and it just works. Llama-3.1-8B-Instruct + 14K ex beats GPT-4o (!) on chat & creative writing, & even Claude-3.7-Sonnet (thinking) on AlpacaEval2 and WildBench! Read on. đź§µ 1/8
3
17
110
A16Z SPEEDRUN 2026 UPDATE: I think most people secretly know if they’re founders or not. Some of you can never be happy working inside a giant company, writing docs, in endless meetings. Deep down, you know you’re supposed to build. we're opening up a16z speedrun today! We are
122
154
1K
GPT-5 eval’ing on collie, tau^2, swe bench, charxiv… big from @princeton_nlp :)
0
0
6
The line between unhinged and cringe in b2c self promo is extremely thin
1
1
9
Feels like ~1/3 of reviews on icml / colm / neurips etc these days are pure gpt. There must be a decent number of papers who win the reviewer lottery getting published based only on LLM vibe checks :)
0
0
2
can @GeneralTxn translate UI into boomer ty for the help
1
0
5
Maybe even a move-count penalty would discourage alg-heavy methods — like, to discover new algs the model will have to mess up a bunch and then have start the solve over, incurring heavy penalty). Maybe algs would have to be some emergent thing where a model has to have gained
0
0
1
Another thing to try would be a sort of curriculum learning approach… first teach cross solving, then F2L, then LL. Obv this is less pure but potentially more feasible / might converge quicker (no need to invent a whole new method). The other cool thing here is the potential to
1
0
1
The issue here is that high-level priors (e.g. CFOP) wouldn’t be useful at all. If anything, the model would prob learn some FMC-type blockbuilding, which would be cool, especially if the reasoning traces are interpretable / can be distilled into a new method.
1
0
2
One thing to try: Start with very short scrambles (1-2 moves), describe the state, and ask a LRM to reason through the answer and return a solution. Call some script to apply the solution to the scrambled cube and there’s the reward. Then Incrementally increase the length of the
1
0
1
But even if LL algs could be retrieved, the simple spatial reasoning is still the main hurdle — a couple moves are enough to completely wreck its understanding of where the pieces are. The most intuitive way to me to describe the cube state is describing where each edge / corner
1
0
1
LLMs seem to know a decent about different methods (at least high-level) and even some algs. 4o listed off all the PLLs (only messed up F perm) but failed horribly at OLL, COLL, other sets (spammed a lot of sune and F sexy F’ stuff).
1
0
1
Crazy that (afaik) no LLM has solved a 3x3 Rubik’s cube even with fine-tuning (generalize much?). At the same time, action space is small and the reward is simple (solved or not)… seems ripe for LRMs / RL. Some thoughts as a long-retired cuber…
1
1
7
What did Aristotle actually write? We think we know, but reality is messy. As Ancient Greek texts traveled through history, they were copied and recopied countless times, accumulating subtle errors with each generation. Our new #NAACL2025 findings paper tackles this challenge.
1
2
7