Andi Marafioti @andimarafioti X Profile

Andi Marafioti

@andimarafioti

Followers

6K

Following

6K

Media

269

Statuses

2K

cooking multimodal models @huggingface

Bern, Switzerland

Joined April 2022

Don't wanna be here? Send us removal request.

Andi Marafioti

@andimarafioti

2 months

Fuck it. Today, we open source FineVision: the finest curation of datasets for VLMs, over 200 sources! > 20% improvement across 10 benchmarks > 17M unique images > 10B answer tokens > New capabilities: GUI navigation, pointing, counting FineVision 10x’s open-source VLMs.

23

115

927

clem 🤗

@ClementDelangue

10 hours

Robots are going trick-or-treating!

11

19

170

TakeProfitTrader

@TakeProfitLLC

23 days

FUTURES TRADERS: Get 40% off all evals, no activation fees, end-of-day drawdown in our live-market PRO+ accounts…and still daily PRO payouts!

0

17

145

Andi Marafioti

@andimarafioti

3 hours

The kids loved talking to Reachy Mini! I first set it to speak Swiss German for one group, then Greek for a single child. It made me realize how robots like this could be great language partners for children growing up with few native speakers around.

1

0

11

Andi Marafioti

@andimarafioti

12 hours

The 🤗 science team put out an incredible resource for anyone interested in training LLMs from scratch to SOTA. Their dedication to democratising ML is inspiring!

Loubna Ben Allal

@LoubnaBenAllal1

1 day

After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you

0

1

19

Andi Marafioti

@andimarafioti

13 hours

This does it, I’m bringing a reachy mini beta to my Halloween party tonight! Let’s hope the kids like him 🤗

clem 🤗

@ClementDelangue

1 day

Happy Halloween from Reachy Mini! You'll be able to 3D print these skins at home thanks to open-source

0

1

7

Don Winslow

@donwinslow

8 hours

Have you watched... Trump and the Death of the American Farmer?

62

744

1K

Lewis Tunstall

@_lewtun

1 day

We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,

15

74

389

elie

@eliebakouch

1 day

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably https://t.co/iN2JtWhn23

92

580

4K

Lewis Tunstall

@_lewtun

2 days

🌟 Introducing General On-Policy Logit Distillation 🌟 Inspired by the latest from @thinkymachines, we extend on-policy distillation to enable ANY teacher to be distilled into ANY student, even if their tokenizers differ! We've added this to TRL so you can now take any pair of

16

62

431

Andi Marafioti

@andimarafioti

2 days

my slurm script

1

0

7

CELSIUS Energy Drink

@CelsiusOfficial

2 months

Hydrate. Hustle. GO! CELSIUS HYDRATION - The ultimate hydration for every move. CELSIUS. LIVE. FIT. GO!

203

380

5K

Andi Marafioti

@andimarafioti

2 days

New ML interview question: I have a training job that fails 1/5 times when I launch with 64 parallel jobs. This is the error. What is happening? (Feel free to check the files in nanoVLM, this is not a drill)

17

1

126

Andi Marafioti

@andimarafioti

2 days

Finished building my Reachy mini beta! I'm trying out the conversation demo, and it switched automatically to Spanish, is my accent that strong?😅

4

1

29

Orr Zohar

@orr_zohar

3 days

🚨Huge for multimodal/vision AI: Datasets hit 100s of TB, making on-prem storage a nightmare. 🤗Now stream them directly from Hugging Face to GPUs - unlocking scalable training of everything from vlms to world models. 🚀 I've battled storage limits for years; thrilled to move

Andi Marafioti

@andimarafioti

4 days

You can now train SOTA models without any storage!🌩️ We completely revamped the Hub’s backend to enable streaming at scale. We streamed TBs of data to 100s of H100s to train SOTA VLMs and saw serious speed-ups. But how?

1

10

68

Andi Marafioti

@andimarafioti

3 days

I finally got a reachy mini beta! Setting down to build it now 🤗

5

2

35

Andi Marafioti

@andimarafioti

4 days

Want to train your next SOTA model without the data-loading nightmare? Read the blog by me, @lhoestq, @ben_burtenshaw, @pcuenq, and @mervenoyann: https://t.co/A7zGv6BHq6 Get started today: pip install --upgrade datasets huggingface_hub

huggingface.co

0

1

18

Andi Marafioti

@andimarafioti

4 days

We're already using this to train our next-gen models with nanoVLM. Streaming directly from the Hub is now as fast as our cluster's local SSDs, but without the 3-hour wait to download and prep the data. See the implementation:

github.com

The simplest, fastest repository for training/finetuning small-sized VLMs. - huggingface/nanoVLM

1

0

15

Andi Marafioti

@andimarafioti

4 days

How? We rebuilt the backend for massive concurrency. ⚡️ Persistent Data Files Cache: Only the first worker resolves the data. No more traffic jam of requests. 🏎️ Parquet Prefetching: We fetch data while the GPU is busy, eliminating I/O bottlenecks. Blog:

huggingface.co

1

3

25

Andi Marafioti

@andimarafioti

4 days

You can now train SOTA models without any storage!🌩️ We completely revamped the Hub’s backend to enable streaming at scale. We streamed TBs of data to 100s of H100s to train SOTA VLMs and saw serious speed-ups. But how?

15

31

222

Victor Renard

@valent44355

1 day

This company is flying - Do you know why ? They just announced a 300-acre AI-energy and EV project, the size of over 200 stadiums. YES, SIZE OF 200+ Stadiums I cannot even imagine, each time i go to a game, now think of 200 games one near another...this is massive...

9

16

144

Andi Marafioti

@andimarafioti

7 days

How do you adapt the learning rate when changing the batch size? I usually take # tokens per batch as batch size and do: LR_large = LR_small * sqrt(Bs_large / Bs_small) using Adam as the optimizer. Does it make sense?

3

0

22

merve

@mervenoyann

8 days

we just updated the model comparison on our blog for you 🫡 added Chandra, OlmOCR-2, Qwen3-VL and their averaged OlmOCR score!

10

51

381

Shekswess

@Shekswess

1 month

Tiny Reasoning Language Model (trlm-135) ⚡ A 135M parameter experiment to see if small models can learn structured reasoning with the right data + training strategy. 💳 Model Card:

huggingface.co

26

96

691

Erik Kaunismäki

@ErikKaum

9 days

Deploy your favorite OCR models with few-clicks directly from Hugging Face 🔥 📷we've added the latest bleeding edge OCR models to the Inference Endpoints catalog to make it easy for you to get started! links 👇

8

31

236

CrowdHealth

@JoinCrowdHealth

13 hours

Healthcare DEFLATION continues: Family of 4 paid $530 in Nov 24 Family of 4 will pay $505 in Nov 25 5% reduction Individual (<55) paid $160 in Nov 24 Individual (<55) will pay $150 in Nov 25 6% reduction

10

12

201