j_sulz Profile Banner
Jared Sulzdorf Profile
Jared Sulzdorf

@j_sulz

Followers
386
Following
379
Media
54
Statuses
2K

I like pretty things, functional things, funny things, food things, and computer things. Not necessarily in that order. Making things go fast @huggingface

Seattle, WA
Joined September 2008
Don't wanna be here? Send us removal request.
@reach_vb
Vaibhav (VB) Srivastav
8 days
Any git power users in my mutuals/ timeline? We have a new and faster git experience coming up on Hugging Face and we'd love to get feedback from you! Comment or DM and I'll hook you up!
1
5
29
@reach_vb
Vaibhav (VB) Srivastav
20 days
The Hugging Face Hub team is on a tear recently: > You can create custom apps with domains on spaces > Edit GGUF metadata on the Fly > 100% of the Hub is powered by Xet - faster, efficient > Responses API support for ALL Inference Providers > MCP-UI support for HF MCP Server >
1
6
46
@ggerganov
Georgi Gerganov
23 days
HuggingFace just shipped in-browser GGUF editing It allows you to edit GGUF metadata in the comfort of your browser, without having to even download the full model. This feature is enabled via the Xet technology that makes partial file updates possible.
6
52
378
@j_sulz
Jared Sulzdorf
26 days
Today, we've finalized this first phase of migrating the Hub to a new, modern storage system. One that's built to scale with AI builders of today and tomorrow.Β  https://t.co/qYmxxfGh7t There's still a lot of work to do, but we're excited for what's next.Β πŸ’ͺ
Tweet card summary image
huggingface.co
0
0
1
@j_sulz
Jared Sulzdorf
26 days
The Hub is on 100% on Xet.Β πŸš€ A little over a year ago, @huggingface acquired @xetdata to unlock the next phase of growth in models and datasets. https://t.co/DvfXrLRAnM In April, there were 1,000 Hugging Face repos on Xet. Now every repo (over 6M) on the Hub is on Xet.
3
1
9
@lhoestq
Quentin Lhoest πŸ€—
3 months
Let me explain why Hugging Face Datasets storage is faster than S3 + why today's release changes everything 🧡
11
61
571
@lhoestq
Quentin Lhoest πŸ€—
3 months
New blog post 🚨 Every data engineer should read it @kszucs_ (@ApacheArrow PMC) announces how to drastically speed up Parquet files uploads and downloads. Yes, it can easily outspeed S3. Best part: the feature enabling this is open source Link in 🧡
1
3
25
@lhoestq
Quentin Lhoest πŸ€—
3 months
A new Pandas feature landed 3 days ago and no one noticed. Upload ONLY THE NEW DATA to dedupe-based storage like @huggingface (Xet). Data that already exist in other files don't need to be uploaded. Possible thanks to the recent addition of Content Defined Chunking for Parquet.
3
12
49
@j_sulz
Jared Sulzdorf
4 months
A sneaky part of making this all work is our backward compatibility with Git LFS. This allows us to roll out a significant protocol change without forcing workflow changes We call this the Git LFS Bridge internally and like our migration process, it's power is in its simplicity
0
0
2
@j_sulz
Jared Sulzdorf
4 months
You can see over the past few months some of the biggest migrations show up in our cluster throughput. Each spike corresponds to a significant migration (where we download from LFS and upload to Xet) with the baseline steadily increasing to just shy of 100 Gb/s
1
0
2
@j_sulz
Jared Sulzdorf
4 months
The engine behind moving from Git LFS to Xet is our migration process. It's simple, powerful, and has moved well over a dozen PB just by itself. Here's a high level view of how it works
1
0
2
@j_sulz
Jared Sulzdorf
4 months
We've moved the first 20PB from Git LFS to Xet on @huggingface without any interruptions, now we're migrating the rest of the Hub. We got this far by focusing on the community first. Here's a deep dive on the infra making this possible and what's next:
Tweet card summary image
huggingface.co
1
2
28
@j_sulz
Jared Sulzdorf
4 months
These are hard numbers to put into context, but let's try. The latest run of Common Crawl from @CommonCrawl was 471 TB. We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours. 🀯🀯🀯
0
0
0
@j_sulz
Jared Sulzdorf
4 months
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
1
0
1
@j_sulz
Jared Sulzdorf
4 months
It's been a bit since I took a step back and looked at our progress to migrate @huggingface from Git LFS to Xet, but every time I do it's mind boggling. A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today? πŸ€— 700,000 users/orgs πŸ“ˆ 350,000 repos πŸš€ 15PB
3
1
12
@lhoestq
Quentin Lhoest πŸ€—
5 months
Xet is now the default storage for new builders on @huggingface ! What it means for πŸ€—Datasets: - Deduplicated downloads and uploads for speed⚑ - Works with the new Parquet CDC writer, robust to insert/delete/edits πŸ’ͺ @ApacheParquet has a bright future on HF :)
3
7
42
@j_sulz
Jared Sulzdorf
5 months
To migrate your existing repos to Xet, sign up here https://t.co/XoHLSCblZJ And we'll take care of the rest πŸ€—
Tweet card summary image
huggingface.co
0
0
3
@j_sulz
Jared Sulzdorf
5 months
New users and organizations can say goodbye to LFS on @huggingface; Xet is now the default storage for new builders on the Hub πŸš€πŸš€πŸš€ Just sign up for an account, create a new repo, pip install huggingface_hub and you're off! https://t.co/qbIlyhlecH
Tweet card summary image
huggingface.co
4
7
27
@j_sulz
Jared Sulzdorf
5 months
So many folks to thank, but a special πŸ€—β™₯️ to @bartowski1182 @UnslothAI @PrunaAI @nomic_ai @Alibaba_Qwen @tomaarsen @ngxson and everyone that has jumped on the Xet waitlist organically. If you want to join the fun, the waitlist is over here
Tweet card summary image
huggingface.co
0
0
1
@j_sulz
Jared Sulzdorf
5 months
Continuing to move all the LFS bytes into Xet storage on Hugging Face! Currently up to: πŸ€— 5,500 users and orgs with Xet access πŸš€ 150,000 Xet-backed models and datasets 🀯 4+ PB managed by Xet How much more to go? If the Hub's top storage users are any indication: many bytes
1
3
6