Jenia Jitsev ๐ณ๏ธโ๐ ๐บ๐ฆ ๐ฎ๐ฑ
            
            @JJitsev
Followers
                3K
              Following
                5K
              Media
                241
              Statuses
                2K
              CLIP Interrogator infers: "Arbeitsrat fรผr Kunst, AI Researcher, meet the actor behind the scenes, with curls" they/them. Co-founder & scientific lead LAION e.V.
              
              Joined September 2022
            
            
           Our new work uses scaling law derivation to enable robust model and dataset comparison, a step towards guided, reproducible progress in open foundation model research. Following the comparison, we also release openMammut-L-14 with 0-shot IN1K 80.34%.  https://t.co/1IoTd8NTNO  1/n 
          
                
                2
              
              
                
                11
              
              
                
                58
              
             As usual for @laion_ai work - entire research pipeline is open and reproducible.  https://t.co/DiuR6g8pil  Top work by @marnezhurina @tomerporian @gpuccetti92 @tommiekerssies @mehdidc open-ฮจ (open-sci) collective. Using grants on Leonardo (EuroHPC) and JUWELS Booster (GCS) 
          
            
            github.com
              Contribute to LAION-AI/scaling-laws-for-comparison development by creating an account on GitHub.
            
                
                0
              
              
                
                0
              
              
                
                2
              
             Scaling laws tell though the full story and provide the correct comparison across scales, predicting one should prefer MaMMUT at higher ones. We test the predictions and obtain openMaMMUT L/14 with 80.3% IN1K zero-shot. Get it here:  https://t.co/tyA99iaApl  5/n 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             DFN 1.4B - clear line crossing again, crossing point in similar region. Scaling laws across various datasets give consistent comparison - MaMMUT takes over CLIP at larger compute scales. Imagine measuring on single scales < 10^11 FLOPS. You would hardly chose MaMMUT. 4/n 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             Same story repeats when deriving scaling law for CLIP and MaMMUT on Re-LAION. Clear line crossing, similar crossing point close to 10^11 GFLOPS. 3/n 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             Eg, model comparison CLIP vs MaMMUT using derived scaling laws for zero-shot IN1K and MSCOCO retrieval as downstream tasks. On DataComp-1.4B, MaMMUT takes over CLIP at larger compute scales, while underperforming it on smaller ones - line crossing on scaling law plot. 2/n 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             You fancy crossing lines on scaling law plots? Enjoy those in our work  https://t.co/snYXA7AcRg  accepted at #NeurIPS2025, where we show how robust model and dataset comparison can be done by scaling law derivation. Single scale comparison can mislead - lets go beyond that. 1/n 
          
            
            arxiv.org
              In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. We show here how scaling law...
            
                
                1
              
              
                
                3
              
              
                
                9
              
             Very important work & open pipeline - tapping public pdfs as further data source besides Common Crawl HTML text. Preliminary evidence finePDF + FineWeb-Edu + DCLM mix might give a stronger dataset than Nemotron CC v2 (tested so far only on one fixed scale 1.7B 36B token budget) 
           We are releasing ๐ FinePDFs: the largest PDF dataset spanning over half a billion documents! - Long context: Documents are 2x longer than web text - 3T tokens from high-demand domains like legal and science. - Heavily improves over SoTA when mixed with FW-EDU&DCLM web copora. 
            
                
                0
              
              
                
                1
              
              
                
                7
              
             Marin shows impressively how science on foundation models can be done in the open 
          
            
            github.com
              Description We evaluate a suite of optimizers on Transformer-style language models (130 Mโ1.2 B parameters) trained on up to 16ร Chinchilla-optimal data. The goal is to quantify real speedups under...
            
                
                0
              
              
                
                0
              
              
                
                1
              
             Thorough and needed study of various optimizers. Again emphasizing how important it is to perform proper hyperparam tuning when doing comparisons. When scaling up, Muon advantage is still there, but shrinks to 1.1x over standard AdamW if tuning both. AdamW tuning seems easier. 
           We did a very careful study of 10 optimizers with no horse in the race. Despite all the excitement about Muon, Mars, Kron, Soap, etc., at the end of the day, if you tune the hyperparameters rigorously and scale up, the speedup over AdamW diminishes to only 10% :-( Experiments 
          
                
                1
              
              
                
                0
              
              
                
                4
              
             This was first, but surely not the last colab between open-sci, @laion_ai and @openEuroLLM. Establishing baselines and good starting grounds for experiments to create strong open foundation models is important, and I am happy to see it worked out so well. 
          
                
                0
              
              
                
                1
              
              
                
                1
              
             If you look for a reference baseline to compare to your own training attempt and dont find it in models trained for Pythia, DCLM, OLMo or DataDecide - have a look HF: 
          
            
            huggingface.co
            
                
                1
              
              
                
                0
              
              
                
                1
              
             We release open-sci-ref-0.01 - a research reference base model family trained on 8 open datasets (CommonCorpus, Pile, C4, SlimPajama, HPLT-2.0, FineWeb-Edu-1.4T, DCLM-baseline, Nemotron-CC-HQ), up to 1.7B model and 1T tokens scale.  https://t.co/ZFDDzHMJN7 
          
          
                
                1
              
              
                
                3
              
              
                
                13
              
             The manifest reads great, this is the right spirit, and we need same not only for startup/founders culture, but also for academia/basic research. Again, an example for EU how to proceed to come out of current misery.  https://t.co/AYSOAsrcXn 
          
          
                
                0
              
              
                
                0
              
              
                
                0
              
             This is awesome. Hope such spots will pop up more and more all around the world. Eg EU would do well to support such entities with all it has, instead of pumping funds into mostly old, dysfunctional structures which just prevent everything inspiring and novel. 
           I'm just back from two days in this totally crazy place in Finland called @shipfr8 It's a former hotel where they now bring cracked teenagers from all over the world (big part from the US, but also Europe, Asia) to spend 3 months 100% focused on an idea they want to build. The 
            
                
                1
              
              
                
                1
              
              
                
                3
              
             Very nice work - re-processing CC math and code links from existing open datasets with a better parser rendering HTML with Lynx and converting to LaTeX via LLM. Impressive 133B scale and clear gains on math and code, good to have it as open dataset to experiment with. 
           We just released Nemotron-CC-Math ๐ Equations on web arenโt just LaTeX-theyโre in MathML,<pre> tags,inline,even images.Code shows up just as many ways. Most parsers drop it. Nemotron-CC-Math(133B tokens) reprocesses CommonCrawl math pages to capture math equations +code reliably 
          
                
                0
              
              
                
                0
              
              
                
                3
              
             Game Reasoning Arena - a collaborative LAION project using various game play scenarios to study reasoning in language models. 
           ๐ท New research: How do LLMs think strategically? We built Game Reasoning Arena to find out. LLMs play strategic games while we capture their reasoning traces. Paper: 
          
                
                0
              
              
                
                0
              
              
                
                4
              
             ii) No evals as encoder in VLMs, like in Cambrian or OpenVision. Dino v2 was not doing good as component in VLMs, Dino v3 comparison would be insightful, as it is important to know model behavior as encoder for generalist vision-language learning, not only as standalone. 
          
                
                0
              
              
                
                0
              
              
                
                3
              
             It is good to see progress in SSL on images only via Dino v3. I think there are some unfortunate bits in training/evaluation that make it harder to see where model stands. i) Dino v3 uses ImageNet-1k, 22k for pretraining, doing then part of evals again on ImageNet-1k 1/2 
          
                
                1
              
              
                
                0
              
              
                
                3
              
             Debunking yet another study of many that claim benefit of "brain-inspired" mechanisms without doing proper controls - comparing to reference transformer of same size. Doing reference comparisons is the way to avoid "brain-inspired" further sliding towards red flag for scams. 
           Finding #1: The "hierarchical" architecture had minimal performance impact when compared to a similarly sized transformer A drop-in transformer comes within a few points without any hyperparameter optimization. See our full post:  https://t.co/PXDvDY020D 
            
            
                
                5
              
              
                
                8
              
              
                
                82
              
             Bibi files, must watch. Israel is a bunch which survived through worse things than Bibi. Would he landed up in jail years before as he was supposed to, many things would have been probably much easier.  https://t.co/Y8KkCd4mEj 
          
          
                
                1
              
              
                
                0
              
              
                
                0