Kevin Lu
@_kevinlu
Followers
10K
Following
1K
Media
14
Statuses
72
Researcher @thinkymachines. Formerly: - @openai: reinforcement learning, synthetic data - @berkeley_ai: decision transformer, universal computation
SF š³ļøāš
Joined October 2020
Come check out o1-mini: SoTA math reasoning in a small package https://t.co/iftuVLkkZ6 with @ren_hongyu @shengjia_zhao @Eric_Wallace_ & the rest of the OpenAI team
17
30
274
Science is best shared! Tell us about what youāve built or discovered with Tinker, so we can tell the world about it on our blog. More details at
thinkingmachines.ai
Announcing Tinker Community Projects
31
39
344
We also replicate the "Distillation for personalization" results from @_kevinlu and @thinkymachines by improving the code performance of a model with SFT and then recovering it's IFEval scores with distillation.
1
3
9
thanks to multi-tenancy and the incredible engineering effort of the team, tinker is now both a joy to use, and super cheap! hope to see you try it out š
Starting Monday, November 3rd, Tinker is switching to a pricing plan that reflects compute usage. This will ensure we have sufficient capacity to clear our waitlist by the end of the year, allowing anyone to sign up and start Tinkering. https://t.co/RGEEBj4VVo
2
1
84
On-policy + Reverse KLD = MiniLLM ( https://t.co/MSlVNWGclo). Really nice blog by @thinkymachines. Exciting to see it being offered as a service!
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
1
24
161
We just added 4 new models to Tinker from the gpt-oss and DeepSeek-V3.1 families. Sign up for the waitlist: https://t.co/CAsOcUduwR
20
37
549
@agarwl_ @Alibaba_Qwen @IdanShenfeld @jyo_pari @__howardchen ...it also happens to still work effectively using only a single prompt, and can be 10-100x cheaper compared to running SFT or RL.
0
1
23
@agarwl_ @Alibaba_Qwen this "continual learning" problem was previously identified by @IdanShenfeld @jyo_pari @__howardchen, who have shown that on-policy methods regress significantly less than SFT when performing domain adaptation https://t.co/1jwQ7lBZuW
For agents to improve over time, they canāt afford to forget what theyāve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? š
2
1
32
in our new post, we walk through great prior work from @agarwl_ & the @Alibaba_Qwen team exploring on-policy distillation using an open source recipe: you can run our experiments on Tinker today! https://t.co/7pVk87qTDH i'm especially excited by the use of on-policy
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
13
25
326
Personally, I love this plot because it so crisply shows the value of active (RL) vs passive (SFT) experience for embodied agents: just 1% of active (RL) interaction gives you jump from orange to blue which you can't approach by just pouring in more passive SFT data (orange).
Super excited to finally share our work on āSelf-Improving Embodied Foundation Modelsā!! (Also accepted at NeurIPS 2025) ⢠Online on-robot Self-Improvement ⢠Self-predicted rewards and success detection ⢠Orders of magnitude sample-efficiency gains compared to SFT alone ā¢
1
2
18
Tinker provides an abstraction layer that is the right one for post-training R&D -- it's the infrastructure I've always wanted. I'm excited to see what people build with it. "Civilization advances by extending the number of important operations which we can perform without
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
49
114
1K
anyone who's tried running RL on top of language models knows how painful it is -- building on top of new research, tinker makes finetuning frontier LLMs easy and performant! it's the latest in a long-standing dream to use finetuning to democratize training and personalization.
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
5
13
243
I used to be really excited about the properties of LoRA's for compositionality and personalization back in the stable diffusion days ( https://t.co/sP5qqVR9tC) -- turns out they are still promising! come check out @johnschulman2 's modern analysis on LoRA's for modern LLM
kevinlu.ai
There is a growing trend to think of large language models (LLMs) as operating systems (OS). They have the ability to read and write to short-term memory in ...
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
0
10
183
Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.
118
461
3K
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is āDefeating Nondeterminism in LLM Inferenceā We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
237
1K
8K
I recently joined @thinkymachines -- super excited to work with the team, I think we have the highest density of research talent in the world š we have a very ambitious roadmap ahead, the right team to work on it, & I think now is a great time to join; you should reach out to
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're
57
27
1K
GPT-5 is what youāve been waiting for ā it defines and extends the cost-intelligence frontier across model sizes today. itās been a long journey, and weāve landed pivotal improvements across many axes in the whole GPT-5 family. and hey no more model picker (by default)!
3
11
86
this was a first-class effort worked on by amazing researchers, and the results speak for themselves I'm proud of OpenAI for this release ā open-weights models are huge for Broadly Distributing the Benefits of AI research glad this model made it out alive š
Our open models are here. Both of them. https://t.co/9tFxefOXcg
2
5
95
come check out the teamās latest models, with substantial contributions from @SuvanshSanjeev & @minyoung_huh š we are a stoneās throw away from gpt5-level performance running locally on your phone
We released two open-weight reasoning modelsāgpt-oss-120b and gpt-oss-20bāunder an Apache 2.0 license. Developed with open-source community feedback, these models deliver meaningful advancements in both reasoning capabilities & safety. https://t.co/PdKHqDqCPf
2
2
68
Ads aren't inherently good or evil. They are fundamentally a market for attention. Attention is humanity's ultimate scarce resource. Our goal should be to allocate it as efficiently as possible.
2
2
51