 
            
              Mathias Lechner
            
            @mlech26l
Followers
                830
              Following
                891
              Media
                25
              Statuses
                123
              Cofounder/CTO at Liquid AI and Research Affiliate MIT
              
              Bay Area
            
            
              
              Joined December 2017
            
            
           I wrote an article summarizing how we designed our tokenizer back in early 2024: 
          
                
                1
              
              
                
                0
              
              
                
                3
              
             We recently ported LFM2 by @LiquidAI_ to Cactus (YC S25), the 350m-i8 runs at 188 tokens/sec on M4 CPU-ONLY. Gemma3 270m-i8 runs at 170 tokens/sec for reference. On an old iPhone 13 Pro, it should near 100 tokens/sec, no NPU or GPU! It’s officially one of our recommended models 
          
                
                0
              
              
                
                3
              
              
                
                31
              
             Meet LFM2-VL-3B, our latest on-device VLM. Top scores in multi-modal instruction following 
          
                
                0
              
              
                
                2
              
              
                
                6
              
             We have a new nano LFM that is on-par with GPT-5 on data extraction with 350M parameters. Introducing LFM2-350M-PII-Extract-JP 🇯🇵 Extracts personally identifiable information (PII) from Japanese text → returns structured JSON for on-device masking of sensitive data. Delivers 
          
                
                14
              
              
                
                38
              
              
                
                394
              
             Day 1 of the @LiquidAI_ fine-tuning hackathon in Tokyo this weekend. Jointly organized with @weights_biases and @LambdaAPI
          
          
                
                1
              
              
                
                7
              
              
                
                50
              
             It's a good model sir. Very proud of the team, we worked very hard to be on the Pareto frontier of quality and efficiency. Even had the chance to write a CPU-optimized kernel for MoE to squeeze everything from the hardware, and that gave us those sweet throughput results. 
           Meet LFM2-8B-A1B, our first on-device Mixture-of-Experts (MoE)! 🐘 > LFM2-8B-A1B is the best on-device MoE in terms of both quality and speed. > Performance of a 3B-4B model class, with up to 5x faster inference profile on CPUs and GPUs. > Quantized variants fit comfortably on 
            
                
                0
              
              
                
                5
              
              
                
                45
              
             LFM2-8B-A1B: Our MoE model that runs on a phone. This is just the start, much more to come ... 
          
                
                0
              
              
                
                3
              
              
                
                25
              
             Meet LFM2-8B-A1B, our first on-device Mixture-of-Experts (MoE)! 🐘 > LFM2-8B-A1B is the best on-device MoE in terms of both quality and speed. > Performance of a 3B-4B model class, with up to 5x faster inference profile on CPUs and GPUs. > Quantized variants fit comfortably on 
          
                
                13
              
              
                
                92
              
              
                
                508
              
             We achieved strong LLM performance + blazing fast edge inference with just: - Grouped Query Attention (global sequence mixer) - Double Gated short convolutions (local sequence mixer) - No linear attention/SSMs needed 
          
                
                0
              
              
                
                0
              
              
                
                4
              
             Just wrote my first Substack post: "Flipping the Script: Why Short Convolutions Don't Need Linear Attention" TL;DR: Everyone's asking why linear attention needs short convolutions. We asked the opposite: do short convolutions need linear attention? LFM2 proves they don't 🎯 
          
                
                2
              
              
                
                2
              
              
                
                14
              
             Cool release by @LiquidAI_: LFM2-Audio-1.5B It’s a pretty cool omni-architecture that enables prediction of both text and audio tokens, meaning it can handle multi-turn S2S, ASR, and TTS (with voice description) within a single model. Great to see, once again this year, a model 
          
                
                2
              
              
                
                32
              
              
                
                161
              
             Our @LiquidAI_ LFM2-Audio-1.5 in a nutshell: - both text and audio in - both text and audio out - 1.5B -> runs locally - open-weight license 
          
                
                0
              
              
                
                9
              
              
                
                36
              
             I have a new blog post about the so-called “tokenizer-free” approach to language modeling and why it’s not tokenizer-free at all. I also talk about why people hate tokenizers so much! 
          
                
                25
              
              
                
                63
              
              
                
                550
              
             We continue to scale our @LiquidAI_ LFM2 series of super efficient language models with LFM2-2.6B 
          
                
                1
              
              
                
                6
              
              
                
                25
              
             The secret sauce most definitely is in the data, given that the architecure is fairly standard: Qwen3 backbone + NaViT SigLip2 (i.e. it uses packed vision sequences). They use patch_size=16 and pixel_shuffle_scale_factor=2 in order to use few image tokens. A 256x256 image will 
           1/ Introducing Isaac 0.1 — our first perceptive-language model. 2B params, open weights. Matches or beats models significantly larger on core perception. We are pushing the efficient frontier for physical AI.  https://t.co/dJ1Wjh2ARK 
            
            
                
                2
              
              
                
                1
              
              
                
                18
              
             We trained our @LiquidAI_ LFM2-350M model 1400x beyond "compute optimal" > Chinchilla scaling laws: ~20 tokens per param > LFM2-350M: ~28,000 tokens per param (1400x more) Why? Because Chinchilla only concerns training compute, while we care about inference cost 
          
                
                0
              
              
                
                4
              
              
                
                29
              
             Very proud of our team at Liquid AI Japan! We’ve just released our first Japanese task-specific SLM in the model library (  https://t.co/3CFVqrAF01),  with many more to come. It’s a small 350M model (i.e. reaches 200 tok/s prefill 40tok/s decode on a Raspi5), so you may notice a 
           Can we get a 350M parameter model to perform as good as GPT4o on specialized tasks? Today, we release an instance of our LFM2-350M, fine-tuned to perform competitively with GPT-4o on real-time general bi-directional Japanese <> English translation of short to medium context. 
            
                
                1
              
              
                
                14
              
              
                
                48
              
             I wrote a short article about LFM-2's (by @LiquidAI_ ) hybrid architecture w/ illustration + simple pytorch impl. 
          
                
                13
              
              
                
                18
              
              
                
                201
              
             
             
             
             
             
             
               
            