Rémi Ouazan @remi_or_ X Profile

Rémi Ouazan

@remi_or_

Followers

407

Following

106

Media

5

Statuses

26

Crafting cutting-edge GPU kernels at Hugging Face 🤗

Joined October 2024

Don't wanna be here? Send us removal request.

Rémi Ouazan

@remi_or_

5 days

blog is here:

huggingface.co

1

5

35

Rémi Ouazan

@remi_or_

5 days

Why are vLLM and transformers so damn fast? ⚡ Continuous batching. That's the secret sauce 🔥 Never heard of it? We just dropped a blog post building it up from first principles 🤗 See what happens inside the minds of the engineers pushing inference to the edge 🧠

4

32

190

Abdennacer Badaoui

@abdennacer0

13 days

Building high-performance, reproducible kernels for AMD ROCm just got a lot easier. I've put together a guide on building, and sharing ROCm-compatible kernels using Hugging Face; so you can focus on optimizing performance rather than spending time on setup. Link in the thread.

2

5

10

Rémi Ouazan

@remi_or_

2 months

cooking 🧑‍🍳

0

5

Pablo Montalvo

@m_olbap

2 months

(1/4) Good design pays itself forward. Transformers hosts hundreds of models and the maintenance load was steadily increasing. In Spring 2024 we shipped modular: define a model as a composition, and auto-generate the runnable modeling_*.py.

1

2

6

Rémi Ouazan

@remi_or_

2 months

btw this is the magic behind the scene:

github.com

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. - huggingface/transformers

0

4

Rémi Ouazan

@remi_or_

2 months

Why is your KV so small? 🤏 In continuous batching, if you increase the max number of tokens per batch, you must decrease the memory allocated for your cache. In transformers, we make sure they are perfectly balanced (as all things should be). No matter how big your model is🦠🐋

2

6

33

Rémi Ouazan

@remi_or_

2 months

no link yet tho, I am not done writing it

2

0

19

Rémi Ouazan

@remi_or_

2 months

You have no idea what attention looks like 🤥 Many talk about attention like it's simple, but few know how it actually works. Even basic stuff like shapes and prefill / decode are not that easy to grasp. Good thing HF is cooking a blogpost to help you out 🫂

8

59

653

Rémi Ouazan

@remi_or_

3 months

What happened to continuous bathing in transformers?? 🫣 We just made it faster, cleaner and available for more models. Just use generate_batch instead of generate to enter a world of fast evals, big throughputs and next-level performance on any GPU⏩ check it out now.

2

15

Nathan

@nathanhabib1011

3 months

🚀 Just updated lighteval’s readme—can’t believe we’ve grown to cover ~7,000 tasks 😳 with top-tier multilingual support 🌍 llm as judge 🤖 multiturn evals 🗣️ coding benchmarks 🧑‍💻

4

9

53

Rémi Ouazan

@remi_or_

3 months

@huggingface @AMD Here it is! https://t.co/IC7hgxptaS

huggingface.co

0

Rémi Ouazan

@remi_or_

3 months

🚀 Big news! @huggingface x @AMD collaboration is powering open-source AI on cutting-edge GPUs! ✅ Daily transformers CI running on MI325 GPUs 🔧 Internal dashboard now PUBLIC Perfect for tracking CI performance on AMD hardware! 💪

1

0

1

Guilherme Penedo

@gui_penedo

5 months

🍷FineWeb now sits at 18.5T tokens, up 3.5T in just over a year. A few years ago, SOTA models like GPT3 and Gopher were trained on <300B tokens, on data only big labs could access. Today, anyone can download high-quality datasets many times that size and train their own.

Guilherme Penedo

@gui_penedo

5 months

Update: 🍷FineWeb and 📚 FineWeb-Edu now include English data from this year's CommonCrawl snapshots, covering Jan-Jun 2025. 🍷FineWeb now has 18.5 trillion tokens. We'll keep publishing timely updates to ensure your models have the latest world knowledge.

2

11

43

Rémi Ouazan

@remi_or_

5 months

We optimized LLM inference kernels for AMD’s MI300X GPUs (192GB 😮) using ROCm/HIP — and it’s all open source. 🔧 Tuned GEMM and fused kernels 📊 Benchmarked vs other GPUs 🚀 Big perf gains 🤝 Open-sourced everything Full write-up: https://t.co/8FSoNhuXi9 #LLM #AI #AMD #MI300X

huggingface.co

1

35

144

Luis

@lusxvr

7 months

Today, we are open-sourcing nanoVLM, a pure pytorch library to train a Vision-Language Model from scratch in 750 lines of code. Training on one H100 for 6h, we get 35.3% on MMStar, matching SmolVLM-256M which was trained with 100x more GPU hours. 👀 Even in a FREE Google Colab,

12

147

921