Spencer Cheng
@spenccheng
Followers
2K
Following
638
Media
7
Statuses
212
2x founder | AI + Construction | I build insanely fast simulators for reinforcement learning at https://t.co/JuTqEHQX4O
Dallas, TX
Joined May 2013
Happy Black Friday. Are you using PufferLib for RL professionally? You can now subscribe right here on X for entry level support!
4
1
34
Want to know how to waste weeks of research? Start testing a new idea on a messy, unstable environment. Better approach: test complex methods on simple envs first. This helps build your intuition of what good performance looks like. Ask me how I know.
2
0
11
RL is so stable at puffer that I almost always assume that the reason an environment does not solve is because there's an env bug.
3
0
51
ML research is an engineering discipline, not a philosophy seminar. You build, you test, you learn. Untested ideas are just speculation.
109
246
3K
C is NOT a hard language. Most people just don’t have the patience to learn pointers properly.
344
188
4K
This was true for small-model RL. The most widely used libraries were training standard baselines at 500-5k steps/second. With PufferLib, we're training 500k-5M steps/second and faster every update!
Can someone explain to me why ~500 tok/s is fast and what in-the-weeds technical constraints prevent 100,000 tok/s at same quality? My gut is there’s incredible waste due to infinite money and in a world w/ 1/10000th of the capital models would be orders of magnitude better
1
3
77
we're going live to explore what this puffer has to offer 🐡
4
4
84
It is incredibly satisfying to see an agent crush in a new sim.
1
0
6
PufferLib lets you train agents in seconds on your laptop. Knowledge kept behind closed doors is easily lost. I'm not going to let that happen to RL.
6
7
138
Lonely fish in your area! See all of me at https://t.co/Vo5QDvKMxO. PR your agents to fill the tank. Thanks for 20k followers! This is going to be a good year for RL.
5
5
90
RL really sucks. It takes 10 hours just to learn breakout. ... a few years ago. It's <30 seconds on 1 GPU now in PufferLib and still dropping. Write faster code.
54
57
1K
When training RL policies on competitive domains, it is often quite useful to have it fight some easy opponent such as random actions or a scripted bot first. Don't try to slay the dragon on day one. Go farm some noobs to make sure your bot can learn.
1
1
22
Priceless jewels were stolen from the Louvre apparently this morning. I bet there is a Hollywood producer who’s flying right now to try and make this a limited series.
0
0
4
This is my exact story with PufferLib. I found it while waiting for a delivery to my job site. I thought it was cool and asked a bunch of questions. I then tried to be as helpful to @jsuarez5341 as I could be. Now I get paid to build RL solutions for Puffer clients.
so @rekram11 is joining our team full time - backstory is interesting he heard me say the thing i always say - find an early open source project with potential, contribute, answer questions, be helpful except unlike 99% of people who hear that he actually went and did it so
0
0
21
For 99% of RL envs, you can just debug your work by having a good renderer. That last 1% you gotta bite the bullet and write tests.
1
0
11