_kevinlu Profile Banner
Kevin Lu Profile
Kevin Lu

@_kevinlu

Followers
10K
Following
1K
Media
14
Statuses
72

Researcher @thinkymachines. Formerly: - @openai: reinforcement learning, synthetic data - @berkeley_ai: decision transformer, universal computation

SF šŸ³ļøā€šŸŒˆ
Joined October 2020
Don't wanna be here? Send us removal request.
@_kevinlu
Kevin Lu
1 year
Come check out o1-mini: SoTA math reasoning in a small package https://t.co/iftuVLkkZ6 with @ren_hongyu @shengjia_zhao @Eric_Wallace_ & the rest of the OpenAI team
17
30
274
@thinkymachines
Thinking Machines
5 days
Science is best shared! Tell us about what you’ve built or discovered with Tinker, so we can tell the world about it on our blog. More details at
Tweet card summary image
thinkingmachines.ai
Announcing Tinker Community Projects
31
39
344
@cmpatino_
Carlos Miguel PatiƱo
14 days
We also replicate the "Distillation for personalization" results from @_kevinlu and @thinkymachines by improving the code performance of a model with SFT and then recovering it's IFEval scores with distillation.
1
3
9
@_kevinlu
Kevin Lu
14 days
thanks to multi-tenancy and the incredible engineering effort of the team, tinker is now both a joy to use, and super cheap! hope to see you try it out šŸ™‚
@thinkymachines
Thinking Machines
14 days
Starting Monday, November 3rd, Tinker is switching to a pricing plan that reflects compute usage. This will ensure we have sufficient capacity to clear our waitlist by the end of the year, allowing anyone to sign up and start Tinkering. https://t.co/RGEEBj4VVo
2
1
84
@_kevinlu
Kevin Lu
14 days
excited to see what academics build using tinker!
@thinkymachines
Thinking Machines
14 days
Today we’re announcing research and teaching grants for Tinker: credits for scholars and students to fine-tune and experiment with open-weight LLMs. Read more and apply at:
2
0
67
@donglixp
Li Dong
15 days
On-policy + Reverse KLD = MiniLLM ( https://t.co/MSlVNWGclo). Really nice blog by @thinkymachines. Exciting to see it being offered as a service!
@thinkymachines
Thinking Machines
16 days
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
1
24
161
@thinkymachines
Thinking Machines
15 days
We just added 4 new models to Tinker from the gpt-oss and DeepSeek-V3.1 families. Sign up for the waitlist: https://t.co/CAsOcUduwR
20
37
549
@_kevinlu
Kevin Lu
16 days
@agarwl_ @Alibaba_Qwen @IdanShenfeld @jyo_pari @__howardchen ...it also happens to still work effectively using only a single prompt, and can be 10-100x cheaper compared to running SFT or RL.
0
1
23
@_kevinlu
Kevin Lu
16 days
@agarwl_ @Alibaba_Qwen this "continual learning" problem was previously identified by @IdanShenfeld @jyo_pari @__howardchen, who have shown that on-policy methods regress significantly less than SFT when performing domain adaptation https://t.co/1jwQ7lBZuW
@jyo_pari
Jyo Pari
2 months
For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? šŸ‘‡
2
1
32
@_kevinlu
Kevin Lu
16 days
in our new post, we walk through great prior work from @agarwl_ & the @Alibaba_Qwen team exploring on-policy distillation using an open source recipe: you can run our experiments on Tinker today! https://t.co/7pVk87qTDH i'm especially excited by the use of on-policy
@thinkymachines
Thinking Machines
16 days
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
13
25
326
@IMordatch
Igor Mordatch
1 month
Personally, I love this plot because it so crisply shows the value of active (RL) vs passive (SFT) experience for embodied agents: just 1% of active (RL) interaction gives you jump from orange to blue which you can't approach by just pouring in more passive SFT data (orange).
@coolboi95
Kamyar Ghasemipour
1 month
Super excited to finally share our work on ā€œSelf-Improving Embodied Foundation Modelsā€!! (Also accepted at NeurIPS 2025) • Online on-robot Self-Improvement • Self-predicted rewards and success detection • Orders of magnitude sample-efficiency gains compared to SFT alone •
1
2
18
@johnschulman2
John Schulman
1 month
Tinker provides an abstraction layer that is the right one for post-training R&D -- it's the infrastructure I've always wanted. I'm excited to see what people build with it. "Civilization advances by extending the number of important operations which we can perform without
@thinkymachines
Thinking Machines
1 month
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
49
114
1K
@_kevinlu
Kevin Lu
1 month
anyone who's tried running RL on top of language models knows how painful it is -- building on top of new research, tinker makes finetuning frontier LLMs easy and performant! it's the latest in a long-standing dream to use finetuning to democratize training and personalization.
@thinkymachines
Thinking Machines
1 month
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
5
13
243
@_kevinlu
Kevin Lu
1 month
I used to be really excited about the properties of LoRA's for compositionality and personalization back in the stable diffusion days ( https://t.co/sP5qqVR9tC) -- turns out they are still promising! come check out @johnschulman2 's modern analysis on LoRA's for modern LLM
kevinlu.ai
There is a growing trend to think of large language models (LLMs) as operating systems (OS). They have the ability to read and write to short-term memory in ...
@thinkymachines
Thinking Machines
1 month
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
0
10
183
@thinkymachines
Thinking Machines
2 months
Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.
118
461
3K
@thinkymachines
Thinking Machines
2 months
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is ā€œDefeating Nondeterminism in LLM Inferenceā€ We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
237
1K
8K
@_kevinlu
Kevin Lu
3 months
I recently joined @thinkymachines -- super excited to work with the team, I think we have the highest density of research talent in the world šŸ™‚ we have a very ambitious roadmap ahead, the right team to work on it, & I think now is a great time to join; you should reach out to
@miramurati
Mira Murati
4 months
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're
57
27
1K
@SuvanshSanjeev
Suvansh Sanjeev
3 months
GPT-5 is what you’ve been waiting for – it defines and extends the cost-intelligence frontier across model sizes today. it’s been a long journey, and we’ve landed pivotal improvements across many axes in the whole GPT-5 family. and hey no more model picker (by default)!
3
11
86
@SuvanshSanjeev
Suvansh Sanjeev
3 months
this was a first-class effort worked on by amazing researchers, and the results speak for themselves I'm proud of OpenAI for this release – open-weights models are huge for Broadly Distributing the Benefits of AI research glad this model made it out alive šŸ™ƒ
@OpenAI
OpenAI
3 months
Our open models are here. Both of them. https://t.co/9tFxefOXcg
2
5
95
@_kevinlu
Kevin Lu
3 months
come check out the team’s latest models, with substantial contributions from @SuvanshSanjeev & @minyoung_huh šŸ™‚ we are a stone’s throw away from gpt5-level performance running locally on your phone
@OpenAI
OpenAI
3 months
We released two open-weight reasoning models—gpt-oss-120b and gpt-oss-20b—under an Apache 2.0 license. Developed with open-source community feedback, these models deliver meaningful advancements in both reasoning capabilities & safety. https://t.co/PdKHqDqCPf
2
2
68
@stevenydc
Steven Yin
4 months
Ads aren't inherently good or evil. They are fundamentally a market for attention. Attention is humanity's ultimate scarce resource. Our goal should be to allocate it as efficiently as possible.
2
2
51