 
            
              Andi Marafioti
            
            @andimarafioti
Followers
                6K
              Following
                6K
              Media
                269
              Statuses
                2K
              cooking multimodal models @huggingface
              
              Bern, Switzerland
            
            
              
              Joined April 2022
            
            
           Fuck it. Today, we open source FineVision: the finest curation of datasets for VLMs, over 200 sources! > 20% improvement across 10 benchmarks > 17M unique images > 10B answer tokens > New capabilities: GUI navigation, pointing, counting FineVision 10x’s open-source VLMs. 
          
                
                23
              
              
                
                115
              
              
                
                927
              
             FUTURES TRADERS: Get 40% off all evals, no activation fees, end-of-day drawdown in our live-market PRO+ accounts…and still daily PRO payouts! 
          
                
                0
              
              
                
                17
              
              
                
                145
              
             The kids loved talking to Reachy Mini! I first set it to speak Swiss German for one group, then Greek for a single child. It made me realize how robots like this could be great language partners for children growing up with few native speakers around. 
          
                
                1
              
              
                
                0
              
              
                
                11
              
             The 🤗 science team put out an incredible resource for anyone interested in training LLMs from scratch to SOTA. Their dedication to democratising ML is inspiring! 
           After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you 
            
                
                0
              
              
                
                1
              
              
                
                19
              
             Have you watched... Trump and the Death of the American Farmer? 
          
                
                62
              
              
                
                744
              
              
                
                1K
              
             We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining, 
          
                
                15
              
              
                
                74
              
              
                
                389
              
             Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably  https://t.co/iN2JtWhn23 
          
          
                
                92
              
              
                
                580
              
              
                
                4K
              
             🌟 Introducing General On-Policy Logit Distillation 🌟 Inspired by the latest from @thinkymachines, we extend on-policy distillation to enable ANY teacher to be distilled into ANY student, even if their tokenizers differ! We've added this to TRL so you can now take any pair of 
          
                
                16
              
              
                
                62
              
              
                
                431
              
             Hydrate. Hustle. GO! CELSIUS HYDRATION - The ultimate hydration for every move. CELSIUS. LIVE. FIT. GO! 
          
                
                203
              
              
                
                380
              
              
                
                5K
              
             New ML interview question: I have a training job that fails 1/5 times when I launch with 64 parallel jobs. This is the error. What is happening? (Feel free to check the files in nanoVLM, this is not a drill) 
          
                
                17
              
              
                
                1
              
              
                
                126
              
             Finished building my Reachy mini beta! I'm trying out the conversation demo, and it switched automatically to Spanish, is my accent that strong?😅 
          
                
                4
              
              
                
                1
              
              
                
                29
              
             🚨Huge for multimodal/vision AI: Datasets hit 100s of TB, making on-prem storage a nightmare. 🤗Now stream them directly from Hugging Face to GPUs - unlocking scalable training of everything from vlms to world models. 🚀 I've battled storage limits for years; thrilled to move 
           You can now train SOTA models without any storage!🌩️ We completely revamped the Hub’s backend to enable streaming at scale. We streamed TBs of data to 100s of H100s to train SOTA VLMs and saw serious speed-ups. But how? 
            
                
                1
              
              
                
                10
              
              
                
                68
              
             I finally got a reachy mini beta! Setting down to build it now 🤗 
          
                
                5
              
              
                
                2
              
              
                
                35
              
             Want to train your next SOTA model without the data-loading nightmare? Read the blog by me, @lhoestq, @ben_burtenshaw, @pcuenq, and @mervenoyann:  https://t.co/A7zGv6BHq6  Get started today: pip install --upgrade datasets huggingface_hub 
          
            
            huggingface.co
            
                
                0
              
              
                
                1
              
              
                
                18
              
             We're already using this to train our next-gen models with nanoVLM. Streaming directly from the Hub is now as fast as our cluster's local SSDs, but without the 3-hour wait to download and prep the data. See the implementation: 
          
            
            github.com
              The simplest, fastest repository for training/finetuning small-sized VLMs. - huggingface/nanoVLM
            
                
                1
              
              
                
                0
              
              
                
                15
              
             How? We rebuilt the backend for massive concurrency. ⚡️ Persistent Data Files Cache: Only the first worker resolves the data. No more traffic jam of requests. 🏎️ Parquet Prefetching: We fetch data while the GPU is busy, eliminating I/O bottlenecks. Blog: 
          
            
            huggingface.co
            
                
                1
              
              
                
                3
              
              
                
                25
              
             You can now train SOTA models without any storage!🌩️ We completely revamped the Hub’s backend to enable streaming at scale. We streamed TBs of data to 100s of H100s to train SOTA VLMs and saw serious speed-ups. But how? 
          
                
                15
              
              
                
                31
              
              
                
                222
              
             This company is flying - Do you know why ? They just announced a 300-acre AI-energy and EV project, the size of over 200 stadiums. YES, SIZE OF 200+ Stadiums I cannot even imagine, each time i go to a game, now think of 200 games one near another...this is massive... 
          
                
                9
              
              
                
                16
              
              
                
                144
              
             How do you adapt the learning rate when changing the batch size? I usually take # tokens per batch as batch size and do: LR_large = LR_small * sqrt(Bs_large / Bs_small) using Adam as the optimizer. Does it make sense? 
          
                
                3
              
              
                
                0
              
              
                
                22
              
             we just updated the model comparison on our blog for you 🫡 added Chandra, OlmOCR-2, Qwen3-VL and their averaged OlmOCR score! 
          
                
                10
              
              
                
                51
              
              
                
                381
              
             Tiny Reasoning Language Model (trlm-135) ⚡ A 135M parameter experiment to see if small models can learn structured reasoning with the right data + training strategy. 💳 Model Card: 
          
            
            huggingface.co
            
                
                26
              
              
                
                96
              
              
                
                691
              
             Deploy your favorite OCR models with few-clicks directly from Hugging Face 🔥 📷we've added the latest bleeding edge OCR models to the Inference Endpoints catalog to make it easy for you to get started! links 👇 
          
                
                8
              
              
                
                31
              
              
                
                236
              
             Healthcare DEFLATION continues: Family of 4 paid $530 in Nov 24 Family of 4 will pay $505 in Nov 25 5% reduction Individual (<55) paid $160 in Nov 24 Individual (<55) will pay $150 in Nov 25 6% reduction 
          
                
                10
              
              
                
                12
              
              
                
                201
              
             
             
             
               
             
             
             
             
             
             
             
             
            