Steven Liu @stevhliu X Profile

Steven Liu

@stevhliu

Followers

902

Following

3K

Media

23

Statuses

181

docs @huggingface 🤗 | sucking at something is the first step towards being sorta good at something

https://t.co/u9KPXjOf8D

Bay Area

Joined January 2019

Don't wanna be here? Send us removal request.

Steven Liu

@stevhliu

4 years

Super excited to start my first day as a technical writer @huggingface! Feels like the first day of school all over again 🤗

11

9

186

Leandro von Werra

@lvwerra

2 days

The Jagged AI Frontier is a Data Frontier Wrote up my thoughts on how coding and math succeeded through data abundance and verification, why science lags behind, and how distillation equalizes the field while RL may just be another data generator. https://t.co/0bNRnIwk66

huggingface.co

10

39

300

Mishig Davaadorj

@mishig25

6 days

Using AI to do more AI at HF. We added chatbot on every hf doc page so that one can get answers faster we are using open source embedding models & llms through hugging chat and one of our inference providers to serve answers

1

4

27

Lysandre

@LysandreJik

7 days

🪦text-generation-inference is now in maintenance mode. Going forward, we will accept pull requests for minor bug fixes, documentation improvements and lightweight maintenance tasks. TGI has initiated the movement for optimized inference engines to rely on a transformers

13

6

133

Steven Liu

@stevhliu

16 days

i'm printing out some commemorative stamps to celebrate transformers 5, let me know if you want some!

Lysandre

@LysandreJik

17 days

Transformers v5's first release candidate is out 🔥 The biggest release of my life. It's been five years since the last major (v4). From 20 architectures to 400, 20k daily downloads to 3 million. The release is huge, w/ tokenization (no slow tokenizers!), modeling & processing.

2

1

9

Steven Liu

@stevhliu

20 days

https://t.co/B3cBTv6eao

bfcm.stripe.com

Visit Stripe City to see Black Friday/Cyber Monday recap data and insights.

0

Steven Liu

@stevhliu

20 days

found @huggingface at @stripe city 🤗

1

0

12

Arthur Zucker

@art_zucker

21 days

for anyone that had issues with tokenizers with a lot of added tokens, my gift is 10x faster loading :)

github.com

8-9x speedup on special tokens and about 4x on non-special cases for deserialization Fixes #1635 and superseeds #1782

0

1

22

Rémi Ouazan

@remi_or_

24 days

Why are vLLM and transformers so damn fast? ⚡ Continuous batching. That's the secret sauce 🔥 Never heard of it? We just dropped a blog post building it up from first principles 🤗 See what happens inside the minds of the engineers pushing inference to the edge 🧠

4

34

192

Steven Liu

@stevhliu

29 days

you good bro?

0

Loubna Ben Allal

@LoubnaBenAllal1

2 months

After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you

36

163

1K

clem 🤗

@ClementDelangue

2 months

The main breakthrough of GPT-5 was to route your messages between a couple of different models to give you the best, cheapest & fastest answer possible. This is cool but imagine if you could do this not only for a couple of models but hundreds of them, big and small, fast and

120

158

2K

Mishig Davaadorj

@mishig25

2 months

Likewise, claude code fetches Hugging Face’s docs markdown instead of html — reducing token usage by about an order of magnitude

Bun

@bunjavascript

3 months

When Claude Code fetches Bun’s docs, Bun’s docs now send markdown instead of HTML by default This shrinks token usage for our docs by about 10x

0

1

12

Steven Liu

@stevhliu

3 months

read the docs to learn more! https://t.co/oRqqF0eWb9

huggingface.co

0

Steven Liu

@stevhliu

3 months

ring attention fixes this by passing K/V pairs from each GPU around in a ring, overlapping communication and computation. eventually, each GPU will have collected all the K/V shards it needs for attention a single GPU doesn't store the full K/V because they share the load

1

0

1

Steven Liu

@stevhliu

3 months

we have something cool coming your way in diffusers! context parallelism splits sequence length across GPUs, making it faster to generate images/videos. but splitting the attention module has issues of its own because each token needs to see the K/V pairs from all other token🤔

1

0

1

Luc Georges

@LucSGeorges

3 months

🔧 We’re working on refactoring transformers’ benchmarking suite to make it prettier, more meaningful, and more extendable. We’ll be measuring the standard metrics you want to see when talking about inference: - TTFT: time to first token, or the time it takes for the first

1

3

21

Charm

@charmcli

3 months

If you love open models, you’ll love this: Crush now runs with @huggingface Inference Providers 🤗✨

14

15

159

Steven Liu

@stevhliu

3 months

↳ a dispatch function routes your call to the correct backend ↳ if you don't select a backend, it defaults to PyTorch native scaled dot product attention, which selects the best backend based on inputs/hardware https://t.co/txu9eV99cd

huggingface.co

0

Steven Liu

@stevhliu

3 months

it has never been easier to use different attention backends (FlashAttention, SageAttention, etc) in Diffusers ↳ offers single interface for multiple backends ↳ each backend is logged in a registry (implementation, constraints, supported params)

1

0

3

clem 🤗

@ClementDelangue

3 months

Xet by Hugging Face is the most important AI technology that nobody is talking about! Under the hood, it now powers 5M Xet-enabled AI models & datasets on HF which see hundreds of terabytes of uploads and downloads every single day. What makes it super powerful is that it

17

42

345