Rémi Ouazan
@remi_or_
Followers
407
Following
106
Media
5
Statuses
26
Crafting cutting-edge GPU kernels at Hugging Face 🤗
Joined October 2024
Why are vLLM and transformers so damn fast? ⚡ Continuous batching. That's the secret sauce 🔥 Never heard of it? We just dropped a blog post building it up from first principles 🤗 See what happens inside the minds of the engineers pushing inference to the edge 🧠
4
32
190
Building high-performance, reproducible kernels for AMD ROCm just got a lot easier. I've put together a guide on building, and sharing ROCm-compatible kernels using Hugging Face; so you can focus on optimizing performance rather than spending time on setup. Link in the thread.
2
5
10
(1/4) Good design pays itself forward. Transformers hosts hundreds of models and the maintenance load was steadily increasing. In Spring 2024 we shipped modular: define a model as a composition, and auto-generate the runnable modeling_*.py.
1
2
6
Why is your KV so small? 🤏 In continuous batching, if you increase the max number of tokens per batch, you must decrease the memory allocated for your cache. In transformers, we make sure they are perfectly balanced (as all things should be). No matter how big your model is🦠🐋
2
6
33
You have no idea what attention looks like 🤥 Many talk about attention like it's simple, but few know how it actually works. Even basic stuff like shapes and prefill / decode are not that easy to grasp. Good thing HF is cooking a blogpost to help you out 🫂
8
59
653
What happened to continuous bathing in transformers?? 🫣 We just made it faster, cleaner and available for more models. Just use generate_batch instead of generate to enter a world of fast evals, big throughputs and next-level performance on any GPU⏩ check it out now.
2
2
15
🚀 Just updated lighteval’s readme—can’t believe we’ve grown to cover ~7,000 tasks 😳 with top-tier multilingual support 🌍 llm as judge 🤖 multiturn evals 🗣️ coding benchmarks 🧑💻
4
9
53
huggingface.co
0
0
0
🚀 Big news! @huggingface x @AMD collaboration is powering open-source AI on cutting-edge GPUs! ✅ Daily transformers CI running on MI325 GPUs 🔧 Internal dashboard now PUBLIC Perfect for tracking CI performance on AMD hardware! 💪
1
0
1
🍷FineWeb now sits at 18.5T tokens, up 3.5T in just over a year. A few years ago, SOTA models like GPT3 and Gopher were trained on <300B tokens, on data only big labs could access. Today, anyone can download high-quality datasets many times that size and train their own.
Update: 🍷FineWeb and 📚 FineWeb-Edu now include English data from this year's CommonCrawl snapshots, covering Jan-Jun 2025. 🍷FineWeb now has 18.5 trillion tokens. We'll keep publishing timely updates to ensure your models have the latest world knowledge.
2
11
43
We optimized LLM inference kernels for AMD’s MI300X GPUs (192GB 😮) using ROCm/HIP — and it’s all open source. 🔧 Tuned GEMM and fused kernels 📊 Benchmarked vs other GPUs 🚀 Big perf gains 🤝 Open-sourced everything Full write-up: https://t.co/8FSoNhuXi9
#LLM #AI #AMD #MI300X
huggingface.co
1
35
144
Today, we are open-sourcing nanoVLM, a pure pytorch library to train a Vision-Language Model from scratch in 750 lines of code. Training on one H100 for 6h, we get 35.3% on MMStar, matching SmolVLM-256M which was trained with 100x more GPU hours. 👀 Even in a FREE Google Colab,
12
147
921