Daniel Han
@danielhanchen
Followers
29K
Following
7K
Media
350
Statuses
3K
Building @UnslothAI. Faster RL / training. LLMs bug hunter. OSS package https://t.co/aRyAAgKOR7. YC S24. Prev ML at NVIDIA. Hyperlearn used by NASA.
San Francisco
Joined April 2016
We managed to fit Llama 3.1 8B < 15GB with GRPO! Experience the R1 "aha moment" for free on Colab! Phi-4 14B also works with @UnslothAI & vLLM is now integrated allowing 20x faster inference! LoRA with GRPO also just work! 1. We removed double memory usage during vLLM serving
unsloth.ai
You can now reproduce your own DeepSeek-R1 reasoning model with Unsloth 100% locally. Using GRPO. Open-source, free and beginner friendly.
You can now reproduce DeepSeek-R1's reasoning on your own local device! Experience the "Aha" moment with just 7GB VRAM. Unsloth reduces GRPO training memory use by 80%. 15GB VRAM can transform Llama-3.1 (8B) & Phi-4 (14B) into reasoning models. Blog: https://t.co/pjvgXOeHZQ
59
290
2K
You can now run Qwen3-VL locally with Unsloth AI. 👇Fine-tune & RL via free notebooks.
You can now run Qwen3-VL locally! 💜 Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes. Qwen3-VL-2B runs at ~40 t/s on 4GB RAM. Fine-tune & RL via Unsloth free notebooks & export to GGUF. https://t.co/L5sOjsgYhm
12
14
163
“With Mays contributing glittering heroics each day with his basket catches, his rubber arm, booming bat, and, most important, his contagious, irrepressible zest..." - Sportswriter, Grantland Rice, June 1954
2
1
73
@giffmana Re: SWA Gemma 3 uses 5 SWA + 1 Global Attn, and it works well - 32K extended to 128K. They did use a diff RoPE scale of 10K for SWA & 1M for Global though. gpt-oss is 1 SWA + 1 sink token + 1 Global and possibly also RoPE extended My guess is Gemma 3 + sink attn will do well :)
2
7
44
Huge thanks to everyone who joined our Unsloth × Mistral × NVIDIA event at YC! 🦥🧡 Was amazing meeting you all and hope you loved the merch! Sorry to those who couldn’t get in (we had 2,800 sign-ups), we’ll plan better next time My PyTorch Conf slides: https://t.co/McHQfJruqZ
4
4
56
Agentic RL tutorial from @UnslothAI! 🙏 @danielhanchen covers the biggest hurdle in your experiments-having a solid environment to train your agents The notebook teaches you how to train a LLM to interact with any OpenEnv: https://t.co/C2x8yWkBhT
3
32
293
Excited to partner with @PyTorch on bringing over 2,000+ RL environments from OpenEnv to Unsloth! All Atari games + tonnes others now work out of the box in Unsloth! Check out our free Colab notebook on RLing 2048:
colab.research.google.com
Run, share, and edit Python notebooks
OpenEnvs for Reinforcement Learning! 🙏 We are launching a universal RL Environment interface today, teaming up with @huggingface and @UnslothAI Let’s take a trip down memory lane: It’s 2016, you read some papers. RL looks promising. But the reality? Cartpole is best we
5
11
91
Catch us at the @PyTorch Conference Poster Session with the TorchAO team later 6PM today! Stand 18 Generative & Large Models Exhibit :)
You can now quantize LLMs to 4-bit and recover 70% accuracy via Quantization-Aware Training. We teamed up with @PyTorch to show how QAT enables: • 4x less VRAM with no inference overhead • 1-3% increase in raw accuracy (GPQA, MMLU Pro) Notebook & Blog: https://t.co/2OP1KgvQDN
0
3
58
Good things come to those who wait. It's been a long wait, we know, but it's finally here. We are releasing STRATO Mercata V2 to the world.
2
15
40
A packed event today with ~300 people, hosted by the amazing @UnslothAI team at the Y Combinator office, very impressive to see what they have accomplished in such a short time!
3
4
54
Thank you to everyone who supported us - we super appreciate it! Especially thanks to the entire HF team and all our partners, and those who tried our dynamic quants and uploads for RL and finetuning! See you all today at our NVIDIA x Mistral event at YC later tonight!
We just hit 100 million lifetime downloads on Hugging Face! 🦥🤗 Huge thanks to all of you! The amazing community, model creators, and HF team. 💖
8
5
83
Huge thanks to @UnslothAI for enabling free, easy fine-tuning of Qwen3-VL (8B)! 🙌
You can now fine-tune Qwen3-VL (8B) for free with our notebook! Unsloth trains VLMs 1.7x faster with 60% less VRAM and 8x longer context - no accuracy loss. GitHub: https://t.co/aZWYAt9MMh Qwen3-VL GRPO Colab: https://t.co/HkjYydXDnR Qwen3-VL Colab:
15
40
611
I’ll be at @PyTorch Conference next week to give talks and more! 🔥 - PyTorch Conference Talks: Oct 21 & 23 - AMD × PyTorch Hackathon: Oct 18-20 - PyTorch Poster Session with TorchAO: Oct 22 P.S. We're having our @UnslothAI × Mistral × NVIDIA event @ Y Combinator Tues, Oct 21!
3
16
109
I'll be giving an RL talk and hosting an Unsloth workshop at @AMD DevDay on Oct 20 in SF! I’ll cover lots of RL, efficient AMD training and more. The workshop will cover RL tips & reward hacking. We also have a Pytorch x AMD virtual AI agents hackathon: https://t.co/mK92X59wST
2
0
37
OpenAI showcased Unsloth for RL on gpt-oss during DevDay! The reinforcement learning example counteracts reward hacking, cheating & creates isolated Python execution environments to solve 2048. Thanks to @dkundel and @BarathAnandan7 for the awesome collab!
OpenAI shows how gpt-oss can autonomously beat 2048 using reinforcement learning (RL). Training was done locally with Unsloth on NVIDIA DGX Spark. You can also do it free on Colab. 🦥 OpenAI DevDay notebook: https://t.co/ptsilrBosy
3
14
132
My talk from OpenAI DevDay 2025 is live! Learn more about gpt-oss, how it fits into the broader OpenAI ecosystem, how to combine it with GPT-5, or use reinforcement fine-tuning with @UnslothAI. All wrapped up with a guest appearance of the @NVIDIAAIDev DGX Spark!
11
12
90
Come join the GPU Mode Hackathon on Oct 24 in SF! I'll be mentoring, so ask anything! There will be B200s & super incredible speakers + mentors from Deepmind, Meta, Periodic Labs, PyTorch, Thinking Machines, NVIDIA, Mobius + more! Come compile with us: https://t.co/cVpaiXH9KJ
2
9
79
OpenAI Devday was a blast! Reinforcement learning with @UnslothAI on gpt-oss on a DGX Spark got demoed live today! Thanks a lot to @dkundel, Barath and @jayrodge15 from NVIDIA for the collab!
3
7
101
The misconception of LoRA being worse than full finetuning in RL just got dispelled in a @thinkymachines post! Even rank=1 works! Glad to have helped in reviewing the blog! @UnslothAI offers the most memory efficient & fastest LoRA for RL, GRPO using 60% less VRAM vs all impls!
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
5
23
279
Sparse topK attention doesn't do O(N) mult during causal decoding instead it's O(K) Similar to sliding window where its limited to rotating last window size, but more like a "dynamic" window I'm still reading the paper, so I'm probs wrong, but topK then QK^T - need to read more
3
6
62
DeepSeek V3.2 breakdown 1. Sparse attention via lightning indexer + top_k attention 2. Uses V3.1 Terminus + 1T continued pretraining tokens 3. 5 specialized models (coding, math etc) via RL then distillation for final ckpt 4. GRPO. Reward functions for length penalty, language
15
161
1K
Countering reward hacking in RL is now in an @UnslothAI notebook for GPT-OSS! 1. Goal: to make faster matrix mult kernels in pure Python 2. Stopping laziness: RL learns to import optimized libs 3. Stopping RL from cheating via global vars 4. Forcing RL not to use cache Details
You can now train OpenAI gpt-oss with Reinforcement Learning in our free notebook! This notebook automatically creates faster kernels via RL. Unsloth RL achieves the fastest inference & lowest VRAM vs. any setup - 0 accuracy loss gpt-oss-20b GRPO Colab: https://t.co/ahKRprp8Gd
6
53
362