andimarafioti Profile Banner
Andi Marafioti Profile
Andi Marafioti

@andimarafioti

Followers
6K
Following
6K
Media
269
Statuses
2K

cooking multimodal models @huggingface

Bern, Switzerland
Joined April 2022
Don't wanna be here? Send us removal request.
@andimarafioti
Andi Marafioti
2 months
Fuck it. Today, we open source FineVision: the finest curation of datasets for VLMs, over 200 sources! > 20% improvement across 10 benchmarks > 17M unique images > 10B answer tokens > New capabilities: GUI navigation, pointing, counting FineVision 10x’s open-source VLMs.
23
115
927
@ClementDelangue
clem 🤗
10 hours
Robots are going trick-or-treating!
11
19
170
@TakeProfitLLC
TakeProfitTrader
23 days
FUTURES TRADERS: Get 40% off all evals, no activation fees, end-of-day drawdown in our live-market PRO+ accounts…and still daily PRO payouts!
0
17
145
@andimarafioti
Andi Marafioti
3 hours
The kids loved talking to Reachy Mini! I first set it to speak Swiss German for one group, then Greek for a single child. It made me realize how robots like this could be great language partners for children growing up with few native speakers around.
1
0
11
@andimarafioti
Andi Marafioti
12 hours
The 🤗 science team put out an incredible resource for anyone interested in training LLMs from scratch to SOTA. Their dedication to democratising ML is inspiring!
@LoubnaBenAllal1
Loubna Ben Allal
1 day
After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you
0
1
19
@andimarafioti
Andi Marafioti
13 hours
This does it, I’m bringing a reachy mini beta to my Halloween party tonight! Let’s hope the kids like him 🤗
@ClementDelangue
clem 🤗
1 day
Happy Halloween from Reachy Mini! You'll be able to 3D print these skins at home thanks to open-source
0
1
7
@donwinslow
Don Winslow
8 hours
Have you watched... Trump and the Death of the American Farmer?
62
744
1K
@_lewtun
Lewis Tunstall
1 day
We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,
15
74
389
@eliebakouch
elie
1 day
Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably https://t.co/iN2JtWhn23
92
580
4K
@_lewtun
Lewis Tunstall
2 days
🌟 Introducing General On-Policy Logit Distillation 🌟 Inspired by the latest from @thinkymachines, we extend on-policy distillation to enable ANY teacher to be distilled into ANY student, even if their tokenizers differ! We've added this to TRL so you can now take any pair of
16
62
431
@andimarafioti
Andi Marafioti
2 days
my slurm script
1
0
7
@CelsiusOfficial
CELSIUS Energy Drink
2 months
Hydrate. Hustle. GO! CELSIUS HYDRATION - The ultimate hydration for every move. CELSIUS. LIVE. FIT. GO!
203
380
5K
@andimarafioti
Andi Marafioti
2 days
New ML interview question: I have a training job that fails 1/5 times when I launch with 64 parallel jobs. This is the error. What is happening? (Feel free to check the files in nanoVLM, this is not a drill)
17
1
126
@andimarafioti
Andi Marafioti
2 days
Finished building my Reachy mini beta! I'm trying out the conversation demo, and it switched automatically to Spanish, is my accent that strong?😅
4
1
29
@orr_zohar
Orr Zohar
3 days
🚨Huge for multimodal/vision AI: Datasets hit 100s of TB, making on-prem storage a nightmare. 🤗Now stream them directly from Hugging Face to GPUs - unlocking scalable training of everything from vlms to world models. 🚀 I've battled storage limits for years; thrilled to move
@andimarafioti
Andi Marafioti
4 days
You can now train SOTA models without any storage!🌩️ We completely revamped the Hub’s backend to enable streaming at scale. We streamed TBs of data to 100s of H100s to train SOTA VLMs and saw serious speed-ups. But how?
1
10
68
@andimarafioti
Andi Marafioti
3 days
I finally got a reachy mini beta! Setting down to build it now 🤗
5
2
35
@andimarafioti
Andi Marafioti
4 days
Want to train your next SOTA model without the data-loading nightmare? Read the blog by me, @lhoestq, @ben_burtenshaw, @pcuenq, and @mervenoyann: https://t.co/A7zGv6BHq6 Get started today: pip install --upgrade datasets huggingface_hub
Tweet card summary image
huggingface.co
0
1
18
@andimarafioti
Andi Marafioti
4 days
We're already using this to train our next-gen models with nanoVLM. Streaming directly from the Hub is now as fast as our cluster's local SSDs, but without the 3-hour wait to download and prep the data. See the implementation:
Tweet card summary image
github.com
The simplest, fastest repository for training/finetuning small-sized VLMs. - huggingface/nanoVLM
1
0
15
@andimarafioti
Andi Marafioti
4 days
How? We rebuilt the backend for massive concurrency. ⚡️ Persistent Data Files Cache: Only the first worker resolves the data. No more traffic jam of requests. 🏎️ Parquet Prefetching: We fetch data while the GPU is busy, eliminating I/O bottlenecks. Blog:
Tweet card summary image
huggingface.co
1
3
25
@andimarafioti
Andi Marafioti
4 days
You can now train SOTA models without any storage!🌩️ We completely revamped the Hub’s backend to enable streaming at scale. We streamed TBs of data to 100s of H100s to train SOTA VLMs and saw serious speed-ups. But how?
15
31
222
@valent44355
Victor Renard
1 day
This company is flying - Do you know why ? They just announced a 300-acre AI-energy and EV project, the size of over 200 stadiums. YES, SIZE OF 200+ Stadiums I cannot even imagine, each time i go to a game, now think of 200 games one near another...this is massive...
9
16
144
@andimarafioti
Andi Marafioti
7 days
How do you adapt the learning rate when changing the batch size? I usually take # tokens per batch as batch size and do: LR_large = LR_small * sqrt(Bs_large / Bs_small) using Adam as the optimizer. Does it make sense?
3
0
22
@mervenoyann
merve
8 days
we just updated the model comparison on our blog for you 🫡 added Chandra, OlmOCR-2, Qwen3-VL and their averaged OlmOCR score!
10
51
381
@Shekswess
Shekswess
1 month
Tiny Reasoning Language Model (trlm-135) ⚡ A 135M parameter experiment to see if small models can learn structured reasoning with the right data + training strategy. 💳 Model Card:
Tweet card summary image
huggingface.co
26
96
691
@ErikKaum
Erik Kaunismäki
9 days
Deploy your favorite OCR models with few-clicks directly from Hugging Face 🔥 📷we've added the latest bleeding edge OCR models to the Inference Endpoints catalog to make it easy for you to get started! links 👇
8
31
236
@JoinCrowdHealth
CrowdHealth
13 hours
Healthcare DEFLATION continues: Family of 4 paid $530 in Nov 24 Family of 4 will pay $505 in Nov 25 5% reduction Individual (<55) paid $160 in Nov 24 Individual (<55) will pay $150 in Nov 25 6% reduction
10
12
201