Jared Sulzdorf
@j_sulz
Followers
386
Following
379
Media
54
Statuses
2K
I like pretty things, functional things, funny things, food things, and computer things. Not necessarily in that order. Making things go fast @huggingface
Seattle, WA
Joined September 2008
Any git power users in my mutuals/ timeline? We have a new and faster git experience coming up on Hugging Face and we'd love to get feedback from you! Comment or DM and I'll hook you up!
1
5
29
The Hugging Face Hub team is on a tear recently: > You can create custom apps with domains on spaces > Edit GGUF metadata on the Fly > 100% of the Hub is powered by Xet - faster, efficient > Responses API support for ALL Inference Providers > MCP-UI support for HF MCP Server >
1
6
46
HuggingFace just shipped in-browser GGUF editing It allows you to edit GGUF metadata in the comfort of your browser, without having to even download the full model. This feature is enabled via the Xet technology that makes partial file updates possible.
6
52
378
Today, we've finalized this first phase of migrating the Hub to a new, modern storage system. One that's built to scale with AI builders of today and tomorrow.Β https://t.co/qYmxxfGh7t There's still a lot of work to do, but we're excited for what's next.Β πͺ
huggingface.co
0
0
1
The Hub is on 100% on Xet.Β π A little over a year ago, @huggingface acquired @xetdata to unlock the next phase of growth in models and datasets. https://t.co/DvfXrLRAnM In April, there were 1,000 Hugging Face repos on Xet. Now every repo (over 6M) on the Hub is on Xet.
3
1
9
Let me explain why Hugging Face Datasets storage is faster than S3 + why today's release changes everything π§΅
11
61
571
New blog post π¨ Every data engineer should read it @kszucs_ (@ApacheArrow PMC) announces how to drastically speed up Parquet files uploads and downloads. Yes, it can easily outspeed S3. Best part: the feature enabling this is open source Link in π§΅
1
3
25
A new Pandas feature landed 3 days ago and no one noticed. Upload ONLY THE NEW DATA to dedupe-based storage like @huggingface (Xet). Data that already exist in other files don't need to be uploaded. Possible thanks to the recent addition of Content Defined Chunking for Parquet.
3
12
49
A sneaky part of making this all work is our backward compatibility with Git LFS. This allows us to roll out a significant protocol change without forcing workflow changes We call this the Git LFS Bridge internally and like our migration process, it's power is in its simplicity
0
0
2
You can see over the past few months some of the biggest migrations show up in our cluster throughput. Each spike corresponds to a significant migration (where we download from LFS and upload to Xet) with the baseline steadily increasing to just shy of 100 Gb/s
1
0
2
The engine behind moving from Git LFS to Xet is our migration process. It's simple, powerful, and has moved well over a dozen PB just by itself. Here's a high level view of how it works
1
0
2
We've moved the first 20PB from Git LFS to Xet on @huggingface without any interruptions, now we're migrating the rest of the Hub. We got this far by focusing on the community first. Here's a deep dive on the infra making this possible and what's next:
huggingface.co
1
2
28
These are hard numbers to put into context, but let's try. The latest run of Common Crawl from @CommonCrawl was 471 TB. We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours. π€―π€―π€―
0
0
0
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
1
0
1
It's been a bit since I took a step back and looked at our progress to migrate @huggingface from Git LFS to Xet, but every time I do it's mind boggling. A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today? π€ 700,000 users/orgs π 350,000 repos π 15PB
3
1
12
Xet is now the default storage for new builders on @huggingface ! What it means for π€Datasets: - Deduplicated downloads and uploads for speedβ‘ - Works with the new Parquet CDC writer, robust to insert/delete/edits πͺ @ApacheParquet has a bright future on HF :)
3
7
42
To migrate your existing repos to Xet, sign up here https://t.co/XoHLSCblZJ And we'll take care of the rest π€
huggingface.co
0
0
3
New users and organizations can say goodbye to LFS on @huggingface; Xet is now the default storage for new builders on the Hub πππ Just sign up for an account, create a new repo, pip install huggingface_hub and you're off! https://t.co/qbIlyhlecH
huggingface.co
4
7
27
So many folks to thank, but a special π€β₯οΈ to @bartowski1182 @UnslothAI @PrunaAI @nomic_ai @Alibaba_Qwen @tomaarsen @ngxson and everyone that has jumped on the Xet waitlist organically. If you want to join the fun, the waitlist is over here
huggingface.co
0
0
1
Continuing to move all the LFS bytes into Xet storage on Hugging Face! Currently up to: π€ 5,500 users and orgs with Xet access π 150,000 Xet-backed models and datasets π€― 4+ PB managed by Xet How much more to go? If the Hub's top storage users are any indication: many bytes
1
3
6