Imanol Schlag
            
            @ImanolSchlag
Followers
                117
              Following
                19
              Media
                0
              Statuses
                8
              Apertus Lead. AI Research Scientist at the ETH AI Center.
              
              Switzerland
            
            
              
              Joined December 2015
            
            
           Developing Apertus, I learned how difficult evals can be. So it's always good to have a second opinion. This recent work evaluated our fully transparent and compliant Apertus 8B model. Here, Apertus is third, beating Llama, Mistral, Olmo, and others! 
          
            
            arxiv.org
              We present Llama-GENBA-10B, a trilingual foundation model addressing English-centric bias in large language models. Built on Llama 3.1-8B and scaled to 10B parameters, Llama-GENBA-10B is...
            
                
                0
              
              
                
                0
              
              
                
                6
              
             Would you like to know the details? Well, you can! Today, we published the first official version of our technical report with a total of 119 pages covering all sorts of details that you will find important. Which part is your favorite or least favorite?  https://t.co/iKF6bdXoVU 
          
          
            
            arxiv.org
              We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual...
            
                
                0
              
              
                
                0
              
              
                
                4
              
             Our Apertus 8B model performs very well, outperforming popular big-tech models, such as Llama 3.1-8B or GPT-OSS-20B, on our benchmarks. Furthermore, our 70B model is among the largest developed by a public institution and competitive with open-yet-obscure models of similar size. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             Can we develop AI responsibly? Yes, and we prove it by example. Two weeks ago, we released our Apertus models, which set a new standard in transparency, inclusivity, and compliance while achieving competitive performance. 🧵 
          
                
                1
              
              
                
                4
              
              
                
                8
              
             We released our first work on the "compliance gap" when pre-training LLMs. We find that AI-opt-outs have a relatively small effect on performance. 👇 
           🚨 AI is in legal hot water. Lawsuits over copyrighted training data are mounting — and content owners are pulling out fast. Top opt-outs? 📰 News & Media 🔬 Science & Tech 🏥 Health Info But here’s the thing: How much do those datasets actually matter for model performance? 🧵👇 
            
                
                0
              
              
                
                0
              
              
                
                6
              
             Wrote a post about Highway networks, ResNets and subtleties of architecture comparisons 
          
                
                5
              
              
                
                40
              
              
                
                258
              
             Come visit our poster "MoEUT: Mixture-of-Experts Universal Transformers" on Friday at 4:30 pm in East Exhibit Hall A-C #1907 on #NeurIPS2024. With Kazuki Irie, @SchmidhuberAI, @ChrisGPotts and @chrmanning. 
          
                
                1
              
              
                
                11
              
              
                
                41
              
             MoEUT: Mixture-of-Experts Universal Transformers Their UT model, for the first time, slightly outperforms standard Transformers on LM tasks such as BLiMP and PIQA, while using significantly less compute and memory repo:  https://t.co/QudGYNLDBb  abs:  https://t.co/CmvJNRBtcT 
          
          
                
                1
              
              
                
                37
              
              
                
                214