stephenroller Profile Banner
Stephen Roller Profile
Stephen Roller

@stephenroller

Followers
6K
Following
24K
Media
147
Statuses
7K

MoTS @thinkymachines. previously pre-training @googledeepmind, @character_ai, and @aiatmeta.

NYC
Joined February 2008
Don't wanna be here? Send us removal request.
@clarejtbirch
clare ❤️‍🔥
21 days
we are opening a bunch of new roles at @thinkymachines this week. research roles are live, more to come 👇
26
31
832
@soumithchintala
Soumith Chintala
28 days
thinking machines....the people are incredible
154
75
3K
@FannieMae
Fannie Mae
1 month
Help your homebuyers achieve their renovation dreams with our HomeStyle® resources. Our lender course and free worksheet make it easier to offer renovation loans. Check them out today.
0
3
40
@ramencult
clara rehmann
2 years
she SLURM on my queue til my jobs finish
3
14
126
@thinkymachines
Thinking Machines
2 months
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
63
402
3K
@cutedreamvibes
only cute things
2 months
112
33K
229K
@BitcoinVoter
Bitcoin Voter Project
25 days
Speak up. Spend freely. Tell Congress to create a de minimis tax exemption for Bitcoin that allows us to spend our money on gas, groceries, and everyday costs with no strings attached. Act today.
0
2
29
@bobmcgrewai
Bob McGrew
2 months
After spending billions of dollars of compute, GPT-5 learned that the most effective use of its token budget is to give itself a little pep talk every time it figures something out. Maybe you should do the same.
@ATabarrok
Alex Tabarrok
2 months
What?
44
103
3K
@lilianweng
Lilian Weng
3 months
GPUs are expensive and setting up the infrastructure to make GPUs work for you properly is complex, making experimentation on cutting-edge models challenging for researchers and ML practitioners. Providing high quality research tooling is one of the most effective ways to
43
129
2K
@thinkymachines
Thinking Machines
3 months
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
234
808
6K
@thinkymachines
Thinking Machines
3 months
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
83
568
4K
@ARIFURETA_ad
ありふれた職業で世界最強 リベリオンソウル
8 days
New game you didn't know about Give it a try!
6
21
434
@thinkymachines
Thinking Machines
3 months
Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.
118
460
3K
@cimmerian_v
Cimmerian Pervert
3 months
How much tylenol to make this happen
47
267
4K
@thinkymachines
Thinking Machines
3 months
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
235
1K
8K
@JeanRemiKing
Jean-Rémi King
3 months
🗣️Job alert: Our Brain and AI team at FAIR (@AIatMeta) is looking for a software engineer with experience in 3D rendering in the browser: https://t.co/UneZ0WFxIX Please RT 🙏
4
22
142
@GTFOthegame
GTFO | 10 Chambers
5 days
GTFO isn't just a game - it's a relentless fight for survival. Enter a nightmare realm where every shadow could be your last. Now 65% off on Steam!
1
2
44
@stephenroller
Stephen Roller
3 years
The undocumented XID errors just taste better. More fresh.
1
1
16
@stephenroller
Stephen Roller
4 months
There’s lots wrong with the OPT models and I don’t recommend using them today. Just the widely-provided explanation for quantization aspects doesn’t actually seem explanatory.
0
0
4
@stephenroller
Stephen Roller
4 months
There’s a line of critique/reviewer feedback in quantization literature that the OPT models are too easy to quantize bc they’re undertrained; but all scales are trained for the same 300B tokens, making the 6.7B and smaller overtrained by Chinchilla estimates.
2
0
8
@vividvoid
Vivid Void
4 months
gm, it's Friday, go outside
42
352
7K
@windscribecom
Windscribe
1 day
We keep hearing that somehow, a single account number is more secure than a username+password combo. We got tired of asking how this makes sense and instead just implemented it into our service, but more secure since it's a hash instead of a number! 👍
10
8
88
@stephenroller
Stephen Roller
5 months
We are moving incredibly fast. Come light up GPUs with us.
@miramurati
Mira Murati
5 months
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're
12
12
344
@gaulicsmith
Sulla
7 months
Millennials use “lol” like STOP at the end of a telegram lol
583
5K
66K
@code_star
Cody Blakeney
5 months
What I send to people to get them to join @datologyai
1
5
26