Hugo Larcher
@hugoch
Followers
477
Following
456
Media
113
Statuses
1K
ML infra/software engineer @huggingface π€. Making GPUs go "brrr".
Bordeaux, France
Joined August 2007
OpenAI just released GPT-OSS: An Open Source Language Model on Hugging Face Open source meaning: πΈ Free π Private π§ Customizable
15
37
215
OMG, the U.S. just downloaded more than 5PB of DeepSeek-R1 on @huggingface in the last few days! Feeling late FOMO in Silicon Valley? π€π
2
4
22
π§΅(2/2) With inference-benchmarker you can: π§ͺ Simulate real workloads (chat, code-gen...) π Measure throughput, time-to-first-token, inter-token latency βοΈ Compare performance across backends & infra π Check it out:
github.com
Inference server benchmarking tool. Contribute to huggingface/inference-benchmarker development by creating an account on GitHub.
0
1
8
π§ LLM inference isnβt just about latency β itβs about consistency under load. Different workloads, configs, and hardware = very different real-world performances. At Hugging Face π€ we built inference-benchmarker β a simple tool to stress-test LLM inference servers. π§΅ (1/2)
2
13
39
@huggingface GPU-fryer helps us detect silent throttling failures: one GPU slows down and every other unit ends up waiting, creating a bottleneck π¦. Check it out:
github.com
Where GPUs get cooked π©βπ³π₯. Contribute to huggingface/gpu-fryer development by creating an account on GitHub.
0
4
45
At @huggingface we rely on GPU-fryer π³ to load-test our 768 H100 GPU cluster. It runs matrix multiplications and monitors TFLOPs outliers to catch any software or hardware throttling β often a sign of cooling issues that need a hardware fix βοΈπ§. π§΅ 1/2
5
29
254
This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU). We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned π€ ! https://t.co/eGpEvqVM8L
huggingface.co
0
1
8
We are introducing multi-backend support in @huggingface Text Generation Inference! With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware.
2
9
59
Just 10 days after o1's public debut, weβre thrilled to unveil the open-source version of the groundbreaking technique behind its success: scaling test-time compute π§ π‘ By giving models more "time to think," LLaMA 1B outperforms LLaMA 8B in mathβbeating a model 8x its size.
115
621
5K
We're turning @huggingface Hub's files into content-defined chunks to speed up your workflows!β‘οΈ This means: - π§ We store your file as deduplicated chunks - β© You only upload changed chunks when iterating! - π Pulling changes? Only download changed chunks!
3
17
53
An easy way to understand Pipeline Parallelism with a self contained implementation. Check it out!
Interested in 4D parallelism but feeling overwhelmed by Megatron-LM codebase? We are currently cooking something with @Haojun_Zhao14 and @xariusrke π In the meantime, here is a self-contained script that implements Pipeline Parallelism (AFAB + 1F1B) in 200 LOC π§΅π
1
1
11
New feature on the Hub! βοΈ Carbon emissions emitted during training now show up on the model card! (requires model authors to fill that info first) Hopes it will prompt more people to show the carbon emissions of their model training! π Thanks a lot to the team who pushed
1
7
28
We passed 5 million users. π₯³That's 5 million of you who have signed up on the Hub π thank you for contributing to the ecosystem and making open Machine Learning happen! We're just getting started π€
254
242
2K
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today weβre releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context
263
1K
6K
I am mind blown by this new technology! AI is now embodied. And we are open-sourcing it all. Listen to @HaixuanT casually discussing with its cute robot at the @linuxfoundation: π What's your name? > I am Reachy, a robot from @pollenrobotics, I have two arms. π What do you
3
25
108
Llama 3 released! π¨π@AIatMeta just released their best open LLM! ππΒ Llama 3 is the next iteration of Llama with a ~10% relative improvement to its predecessor! π€―Β Llama 3 comes in 2 different sizes 8B and 70B with a new extended tokenizer and commercially permissive license!
6
60
254
Introducing: Zephyr 141B-A35B π₯ π₯Mixtral-8x22B fine-tune π€― Using DORPO: new alignment algorithm (no SFT, open ) π With 7k instances of (open) data Very strong IFEval, BBH, AGIEval... Enjoy! π€ https://t.co/MVxTJorIGc
huggingface.co
15
131
711
this 30-min-read blog post on how to craft and generate a 25B+ tokens synthetic text dataset distills more information and alphas than a typical NeurIPS best paper
4
104
738
Huge spatial images dataset released by @ESA_EO and @huggingface π°οΈ so much to build on it!
.@esa's Ξ¦-lab has released, in partnership with @huggingface, the 1st dataset of Major TOM (Terrestrial Observation Metaset), the largest, community-oriented, ML-ready collection of @CopernicusEU #Sentinel2 images ever published and covering over 50% of : https://t.co/IZS4K6YZaC
0
0
11