snow
@snowclipsed
Followers
5K
Following
55K
Media
687
Statuses
11K
latent space surfer, cache-miss eliminator. https://t.co/h8rzm8QyZc
United States
Joined June 2017
new blogpost after a long time! in this series i will talk about how to solve reinforcement learning for long-horizon tasks, incrementally from the most straightforward approaches. (link in replies!) in part I of this series, we throw RL at the cube in its most direct,
8
14
88
Black Friday Comes Early 🦃 Code "BlackFriday25" active NOW for 25% off ALL courses on Just Hacking Training including Constructing Defense 2025! Excludes already discounted Bundles. Expires Nov 30 at Midnight ET.
3
13
28
Sinatras is doing amazing RL work!!
PMPP-Eval Update! Opon release of K2-Thinking, i have evaluated it and couple other models that were requested such as R1 and Qwen3 235B over pmpp-eval coding subset. K2-Thinking is now the best open model available, according to results surpassing sonnet 4.5 for cuda tasks.
0
0
8
a banger of an intro to RL teaching llm to solve rubik's cubes
new blogpost after a long time! in this series i will talk about how to solve reinforcement learning for long-horizon tasks, incrementally from the most straightforward approaches. (link in replies!) in part I of this series, we throw RL at the cube in its most direct,
1
1
10
Great read to close your weekend off, rubricks grpo creating rl envs it really is pretty dense on information check it out
new blogpost after a long time! in this series i will talk about how to solve reinforcement learning for long-horizon tasks, incrementally from the most straightforward approaches. (link in replies!) in part I of this series, we throw RL at the cube in its most direct,
0
2
7
also shoutout to @OccupyingM for doing pre-eliminary research on their end about the same as well, their findings highly correlate with mine :) link to their post :
can your llm rotate a shape inside it's head? i found out yes but it's a fucking idiot when it comes to the upper layer... why? non uniform spatial reasoning.... here's an eval to test the internal latent reasoning of your models.
0
0
3
many, many blogposts to come! i have another 3 queued already :) exciting times.
0
0
2
In the final 2025 Sophia Lecture Dr Bret Weinstein @BretWeinstein explores the deep interplay of genes, culture and consciousness in shaping humanity’s path: consciousness, he argues, is an evolutionary tool for novelty, enabling us to build civilizations that outlive each of us.
1
4
16
also thanks to @fujikanaeda @secemp9 @_ueaj @tokenbender @_vatsadev @myainotez @nyxkrage for nourishing an introduction to RL for me :)
0
0
8
now I'd listen to this banger
1
0
12
the one dimensional flow of a conversation with language models can be severely limiting for many usecases that require deep-end reasoning and have a high error/deviation rate because you can't have detour conversations easily and context rot exists i think comfyui-like UI or
1
0
5
As we wait for @nasa @RocketLab @blueorigin EsCAPADE / New Glenn-2 Launch, don't forget, our 2025 GSPC Photo Contest is closing 12/31/25..! Will the EsCAPADE launch win? Stay tuned! 5 Categories... In the meantime, one of 2024's best by Max Evans @_mgde_
2
9
56
if you think about it, a language model is just a really good prefix tree pruner
0
0
2
New weekend blogpost. Some light PTX exploration, and a simple Top-K kernel.
9
47
490
2 way player who gets it done in a top 10 district. @GBHSWildcat_FB @will_englert @JReid3014 @CoachWoodJr @CoachBlackstock @CoachMikeLondon @WMTribeFootball @NCATRecruiting
0
4
11
if you want the tweet version and not the 10min video version: this is now all it takes to train with prime-rl after installing verifiers
verifiers v0.1.7 is released 🚀 this one's all about making RL training and experimentation waaaay easier: - single-command installation for prime-rl - single-command training w/ unified configs - overhauled vf.RLTrainer for hacking on new algorithms quick demo + links below :)
6
7
78