stevhliu Profile Banner
Steven Liu Profile
Steven Liu

@stevhliu

Followers
902
Following
3K
Media
23
Statuses
181

docs @huggingface 🤗 | sucking at something is the first step towards being sorta good at something

Bay Area
Joined January 2019
Don't wanna be here? Send us removal request.
@stevhliu
Steven Liu
4 years
Super excited to start my first day as a technical writer @huggingface! Feels like the first day of school all over again 🤗
11
9
186
@lvwerra
Leandro von Werra
2 days
The Jagged AI Frontier is a Data Frontier Wrote up my thoughts on how coding and math succeeded through data abundance and verification, why science lags behind, and how distillation equalizes the field while RL may just be another data generator. https://t.co/0bNRnIwk66
Tweet card summary image
huggingface.co
10
39
300
@mishig25
Mishig Davaadorj
6 days
Using AI to do more AI at HF. We added chatbot on every hf doc page so that one can get answers faster we are using open source embedding models & llms through hugging chat and one of our inference providers to serve answers
1
4
27
@LysandreJik
Lysandre
7 days
🪦text-generation-inference is now in maintenance mode. Going forward, we will accept pull requests for minor bug fixes, documentation improvements and lightweight maintenance tasks. TGI has initiated the movement for optimized inference engines to rely on a transformers
13
6
133
@stevhliu
Steven Liu
16 days
i'm printing out some commemorative stamps to celebrate transformers 5, let me know if you want some!
@LysandreJik
Lysandre
17 days
Transformers v5's first release candidate is out 🔥 The biggest release of my life. It's been five years since the last major (v4). From 20 architectures to 400, 20k daily downloads to 3 million. The release is huge, w/ tokenization (no slow tokenizers!), modeling & processing.
2
1
9
@stevhliu
Steven Liu
20 days
found @huggingface at @stripe city 🤗
1
0
12
@art_zucker
Arthur Zucker
21 days
for anyone that had issues with tokenizers with a lot of added tokens, my gift is 10x faster loading :)
Tweet card summary image
github.com
8-9x speedup on special tokens and about 4x on non-special cases for deserialization Fixes #1635 and superseeds #1782
0
1
22
@remi_or_
Rémi Ouazan
24 days
Why are vLLM and transformers so damn fast? ⚡ Continuous batching. That's the secret sauce 🔥 Never heard of it? We just dropped a blog post building it up from first principles 🤗 See what happens inside the minds of the engineers pushing inference to the edge 🧠
4
34
192
@stevhliu
Steven Liu
29 days
you good bro?
0
0
0
@LoubnaBenAllal1
Loubna Ben Allal
2 months
After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you
36
163
1K
@ClementDelangue
clem 🤗
2 months
The main breakthrough of GPT-5 was to route your messages between a couple of different models to give you the best, cheapest & fastest answer possible. This is cool but imagine if you could do this not only for a couple of models but hundreds of them, big and small, fast and
120
158
2K
@mishig25
Mishig Davaadorj
2 months
Likewise, claude code fetches Hugging Face’s docs markdown instead of html — reducing token usage by about an order of magnitude
@bunjavascript
Bun
3 months
When Claude Code fetches Bun’s docs, Bun’s docs now send markdown instead of HTML by default This shrinks token usage for our docs by about 10x
0
1
12
@stevhliu
Steven Liu
3 months
read the docs to learn more! https://t.co/oRqqF0eWb9
Tweet card summary image
huggingface.co
0
0
0
@stevhliu
Steven Liu
3 months
ring attention fixes this by passing K/V pairs from each GPU around in a ring, overlapping communication and computation. eventually, each GPU will have collected all the K/V shards it needs for attention a single GPU doesn't store the full K/V because they share the load
1
0
1
@stevhliu
Steven Liu
3 months
we have something cool coming your way in diffusers! context parallelism splits sequence length across GPUs, making it faster to generate images/videos. but splitting the attention module has issues of its own because each token needs to see the K/V pairs from all other token🤔
1
0
1
@LucSGeorges
Luc Georges
3 months
🔧 We’re working on refactoring transformers’ benchmarking suite to make it prettier, more meaningful, and more extendable. We’ll be measuring the standard metrics you want to see when talking about inference: - TTFT: time to first token, or the time it takes for the first
1
3
21
@charmcli
Charm
3 months
If you love open models, you’ll love this: Crush now runs with @huggingface Inference Providers 🤗✨
14
15
159
@stevhliu
Steven Liu
3 months
↳ a dispatch function routes your call to the correct backend ↳ if you don't select a backend, it defaults to PyTorch native scaled dot product attention, which selects the best backend based on inputs/hardware https://t.co/txu9eV99cd
Tweet card summary image
huggingface.co
0
0
0
@stevhliu
Steven Liu
3 months
it has never been easier to use different attention backends (FlashAttention, SageAttention, etc) in Diffusers ↳ offers single interface for multiple backends ↳ each backend is logged in a registry (implementation, constraints, supported params)
1
0
3
@ClementDelangue
clem 🤗
3 months
Xet by Hugging Face is the most important AI technology that nobody is talking about! Under the hood, it now powers 5M Xet-enabled AI models & datasets on HF which see hundreds of terabytes of uploads and downloads every single day. What makes it super powerful is that it
17
42
345