
Stephen Roller
@stephenroller
Followers
5K
Following
23K
Media
147
Statuses
7K
MoTS @thinkymachines. previously pre-training @googledeepmind ,@character_ai, and @aiatmeta.
NYC
Joined February 2008
After spending billions of dollars of compute, GPT-5 learned that the most effective use of its token budget is to give itself a little pep talk every time it figures something out. Maybe you should do the same.
46
104
3K
GPUs are expensive and setting up the infrastructure to make GPUs work for you properly is complex, making experimentation on cutting-edge models challenging for researchers and ML practitioners. Providing high quality research tooling is one of the most effective ways to
40
127
2K
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
222
767
6K
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
81
555
3K
Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.
118
463
3K
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
240
1K
8K
🗣️Job alert: Our Brain and AI team at FAIR (@AIatMeta) is looking for a software engineer with experience in 3D rendering in the browser: https://t.co/UneZ0WFxIX Please RT 🙏
4
22
143
The undocumented XID errors just taste better. More fresh.
1
1
16
There’s lots wrong with the OPT models and I don’t recommend using them today. Just the widely-provided explanation for quantization aspects doesn’t actually seem explanatory.
0
0
4
There’s a line of critique/reviewer feedback in quantization literature that the OPT models are too easy to quantize bc they’re undertrained; but all scales are trained for the same 300B tokens, making the 6.7B and smaller overtrained by Chinchilla estimates.
2
0
8
We are moving incredibly fast. Come light up GPUs with us.
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're
12
12
345
Based on current administration policies, China will have an influx of returning talent and an accelerated advantage in research investment. You need to be both sinophobic and irrational to expect the US to continue as the global scientific powerhouse with these policies.
0
1
19
Revoking visas of Chinese students studying in critical fields like AI and Robotics is incredibly short-sighted and harmful to America’s long term prosperity. We want the best from every country to work for team America
The U.S. will begin revoking visas of Chinese students, including those with connections to the Chinese Communist Party or studying in critical fields.
19
24
400
The war on science in the US is already affecting private sector research like AlphaFold. Bears repeating but the private sector builds on top of things created by academic research for the public good. This hurts everyone.
13
105
511
American funding for hard sciences has fallen 2/3 this year. In physics, they are receiving 15% of what they did last year. What the fuck are we doing?
376
474
6K