Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱 @JJitsev X Profile

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

Followers

3K

Following

5K

Media

241

Statuses

2K

CLIP Interrogator infers: "Arbeitsrat für Kunst, AI Researcher, meet the actor behind the scenes, with curls" they/them. Co-founder & scientific lead LAION e.V.

Joined September 2022

Don't wanna be here? Send us removal request.

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

5 months

Our new work uses scaling law derivation to enable robust model and dataset comparison, a step towards guided, reproducible progress in open foundation model research. Following the comparison, we also release openMammut-L-14 with 0-shot IN1K 80.34%. https://t.co/1IoTd8NTNO 1/n

2

11

58

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

1 month

As usual for @laion_ai work - entire research pipeline is open and reproducible. https://t.co/DiuR6g8pil Top work by @marnezhurina @tomerporian @gpuccetti92 @tommiekerssies @mehdidc open-Ψ (open-sci) collective. Using grants on Leonardo (EuroHPC) and JUWELS Booster (GCS)

github.com

Contribute to LAION-AI/scaling-laws-for-comparison development by creating an account on GitHub.

0

2

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

1 month

Scaling laws tell though the full story and provide the correct comparison across scales, predicting one should prefer MaMMUT at higher ones. We test the predictions and obtain openMaMMUT L/14 with 80.3% IN1K zero-shot. Get it here: https://t.co/tyA99iaApl 5/n

1

0

1

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

1 month

DFN 1.4B - clear line crossing again, crossing point in similar region. Scaling laws across various datasets give consistent comparison - MaMMUT takes over CLIP at larger compute scales. Imagine measuring on single scales < 10^11 FLOPS. You would hardly chose MaMMUT. 4/n

1

0

1

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

1 month

Same story repeats when deriving scaling law for CLIP and MaMMUT on Re-LAION. Clear line crossing, similar crossing point close to 10^11 GFLOPS. 3/n

1

0

1

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

1 month

Eg, model comparison CLIP vs MaMMUT using derived scaling laws for zero-shot IN1K and MSCOCO retrieval as downstream tasks. On DataComp-1.4B, MaMMUT takes over CLIP at larger compute scales, while underperforming it on smaller ones - line crossing on scaling law plot. 2/n

1

0

1

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

1 month

You fancy crossing lines on scaling law plots? Enjoy those in our work https://t.co/snYXA7AcRg accepted at #NeurIPS2025, where we show how robust model and dataset comparison can be done by scaling law derivation. Single scale comparison can mislead - lets go beyond that. 1/n

arxiv.org

In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. We show here how scaling law...

1

3

9

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

Very important work & open pipeline - tapping public pdfs as further data source besides Common Crawl HTML text. Preliminary evidence finePDF + FineWeb-Edu + DCLM mix might give a stronger dataset than Nemotron CC v2 (tested so far only on one fixed scale 1.7B 36B token budget)

Hynek Kydlíček

@HKydlicek

2 months

We are releasing 📄 FinePDFs: the largest PDF dataset spanning over half a billion documents! - Long context: Documents are 2x longer than web text - 3T tokens from high-demand domains like legal and science. - Heavily improves over SoTA when mixed with FW-EDU&DCLM web copora.

0

1

7

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

Marin shows impressively how science on foundation models can be done in the open

github.com

Description We evaluate a suite of optimizers on Transformer-style language models (130 M–1.2 B parameters) trained on up to 16× Chinchilla-optimal data. The goal is to quantify real speedups under...

0

1

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

Thorough and needed study of various optimizers. Again emphasizing how important it is to perform proper hyperparam tuning when doing comparisons. When scaling up, Muon advantage is still there, but shrinks to 1.1x over standard AdamW if tuning both. AdamW tuning seems easier.

Percy Liang

@percyliang

2 months

We did a very careful study of 10 optimizers with no horse in the race. Despite all the excitement about Muon, Mars, Kron, Soap, etc., at the end of the day, if you tune the hyperparameters rigorously and scale up, the speedup over AdamW diminishes to only 10% :-( Experiments

1

0

4

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

This was first, but surely not the last colab between open-sci, @laion_ai and @openEuroLLM. Establishing baselines and good starting grounds for experiments to create strong open foundation models is important, and I am happy to see it worked out so well.

0

1

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

If you look for a reference baseline to compare to your own training attempt and dont find it in models trained for Pythia, DCLM, OLMo or DataDecide - have a look HF:

huggingface.co

1

0

1

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

We release open-sci-ref-0.01 - a research reference base model family trained on 8 open datasets (CommonCorpus, Pile, C4, SlimPajama, HPLT-2.0, FineWeb-Edu-1.4T, DCLM-baseline, Nemotron-CC-HQ), up to 1.7B model and 1T tokens scale. https://t.co/ZFDDzHMJN7

1

3

13

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

The manifest reads great, this is the right spirit, and we need same not only for startup/founders culture, but also for academia/basic research. Again, an example for EU how to proceed to come out of current misery. https://t.co/AYSOAsrcXn

0

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

This is awesome. Hope such spots will pop up more and more all around the world. Eg EU would do well to support such entities with all it has, instead of pumping funds into mostly old, dysfunctional structures which just prevent everything inspiring and novel.

Thomas Wolf

@Thom_Wolf

2 months

I'm just back from two days in this totally crazy place in Finland called @shipfr8 It's a former hotel where they now bring cracked teenagers from all over the world (big part from the US, but also Europe, Asia) to spend 3 months 100% focused on an idea they want to build. The

1

3

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

Very nice work - re-processing CC math and code links from existing open datasets with a better parser rendering HTML with Lynx and converting to LaTeX via LLM. Impressive 133B scale and clear gains on math and code, good to have it as open dataset to experiment with.

Rabeeh Karimi

@KarimiRabeeh

2 months

We just released Nemotron-CC-Math 🚀 Equations on web aren’t just LaTeX-they’re in MathML,<pre> tags,inline,even images.Code shows up just as many ways. Most parsers drop it. Nemotron-CC-Math(133B tokens) reprocesses CommonCrawl math pages to capture math equations +code reliably

0

3

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

Game Reasoning Arena - a collaborative LAION project using various game play scenarios to study reasoning in language models.

Lucia Cipolina Kun

@LuciaCKun

3 months

📷 New research: How do LLMs think strategically? We built Game Reasoning Arena to find out. LLMs play strategic games while we capture their reasoning traces. Paper:

0

4

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

ii) No evals as encoder in VLMs, like in Cambrian or OpenVision. Dino v2 was not doing good as component in VLMs, Dino v3 comparison would be insightful, as it is important to know model behavior as encoder for generalist vision-language learning, not only as standalone.

0

3

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

2 months

It is good to see progress in SSL on images only via Dino v3. I think there are some unfortunate bits in training/evaluation that make it harder to see where model stands. i) Dino v3 uses ImageNet-1k, 22k for pretraining, doing then part of evals again on ImageNet-1k 1/2

1

0

3

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

3 months

Debunking yet another study of many that claim benefit of "brain-inspired" mechanisms without doing proper controls - comparing to reference transformer of same size. Doing reference comparisons is the way to avoid "brain-inspired" further sliding towards red flag for scams.

ARC Prize

@arcprize

3 months

Finding #1: The "hierarchical" architecture had minimal performance impact when compared to a similarly sized transformer A drop-in transformer comes within a few points without any hyperparameter optimization. See our full post: https://t.co/PXDvDY020D

5

8

82

Jenia Jitsev 🏳️‍🌈 🇺🇦 🇮🇱

@JJitsev

3 months

Bibi files, must watch. Israel is a bunch which survived through worse things than Bibi. Would he landed up in jail years before as he was supposed to, many things would have been probably much easier. https://t.co/Y8KkCd4mEj

1

0