Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ Profile
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ

@JJitsev

Followers
3K
Following
5K
Media
241
Statuses
2K

CLIP Interrogator infers: "Arbeitsrat fรผr Kunst, AI Researcher, meet the actor behind the scenes, with curls" they/them. Co-founder & scientific lead LAION e.V.

Joined September 2022
Don't wanna be here? Send us removal request.
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
5 months
Our new work uses scaling law derivation to enable robust model and dataset comparison, a step towards guided, reproducible progress in open foundation model research. Following the comparison, we also release openMammut-L-14 with 0-shot IN1K 80.34%. https://t.co/1IoTd8NTNO 1/n
2
11
58
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
1 month
As usual for @laion_ai work - entire research pipeline is open and reproducible. https://t.co/DiuR6g8pil Top work by @marnezhurina @tomerporian @gpuccetti92 @tommiekerssies @mehdidc open-ฮจ (open-sci) collective. Using grants on Leonardo (EuroHPC) and JUWELS Booster (GCS)
Tweet card summary image
github.com
Contribute to LAION-AI/scaling-laws-for-comparison development by creating an account on GitHub.
0
0
2
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
1 month
Scaling laws tell though the full story and provide the correct comparison across scales, predicting one should prefer MaMMUT at higher ones. We test the predictions and obtain openMaMMUT L/14 with 80.3% IN1K zero-shot. Get it here: https://t.co/tyA99iaApl 5/n
1
0
1
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
1 month
DFN 1.4B - clear line crossing again, crossing point in similar region. Scaling laws across various datasets give consistent comparison - MaMMUT takes over CLIP at larger compute scales. Imagine measuring on single scales < 10^11 FLOPS. You would hardly chose MaMMUT. 4/n
1
0
1
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
1 month
Same story repeats when deriving scaling law for CLIP and MaMMUT on Re-LAION. Clear line crossing, similar crossing point close to 10^11 GFLOPS. 3/n
1
0
1
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
1 month
Eg, model comparison CLIP vs MaMMUT using derived scaling laws for zero-shot IN1K and MSCOCO retrieval as downstream tasks. On DataComp-1.4B, MaMMUT takes over CLIP at larger compute scales, while underperforming it on smaller ones - line crossing on scaling law plot. 2/n
1
0
1
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
1 month
You fancy crossing lines on scaling law plots? Enjoy those in our work https://t.co/snYXA7AcRg accepted at #NeurIPS2025, where we show how robust model and dataset comparison can be done by scaling law derivation. Single scale comparison can mislead - lets go beyond that. 1/n
Tweet card summary image
arxiv.org
In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. We show here how scaling law...
1
3
9
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
Very important work & open pipeline - tapping public pdfs as further data source besides Common Crawl HTML text. Preliminary evidence finePDF + FineWeb-Edu + DCLM mix might give a stronger dataset than Nemotron CC v2 (tested so far only on one fixed scale 1.7B 36B token budget)
@HKydlicek
Hynek Kydlรญฤek
2 months
We are releasing ๐Ÿ“„ FinePDFs: the largest PDF dataset spanning over half a billion documents! - Long context: Documents are 2x longer than web text - 3T tokens from high-demand domains like legal and science. - Heavily improves over SoTA when mixed with FW-EDU&DCLM web copora.
0
1
7
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
Marin shows impressively how science on foundation models can be done in the open
Tweet card summary image
github.com
Description We evaluate a suite of optimizers on Transformer-style language models (130 Mโ€“1.2 B parameters) trained on up to 16ร— Chinchilla-optimal data. The goal is to quantify real speedups under...
0
0
1
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
Thorough and needed study of various optimizers. Again emphasizing how important it is to perform proper hyperparam tuning when doing comparisons. When scaling up, Muon advantage is still there, but shrinks to 1.1x over standard AdamW if tuning both. AdamW tuning seems easier.
@percyliang
Percy Liang
2 months
We did a very careful study of 10 optimizers with no horse in the race. Despite all the excitement about Muon, Mars, Kron, Soap, etc., at the end of the day, if you tune the hyperparameters rigorously and scale up, the speedup over AdamW diminishes to only 10% :-( Experiments
1
0
4
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
This was first, but surely not the last colab between open-sci, @laion_ai and @openEuroLLM. Establishing baselines and good starting grounds for experiments to create strong open foundation models is important, and I am happy to see it worked out so well.
0
1
1
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
If you look for a reference baseline to compare to your own training attempt and dont find it in models trained for Pythia, DCLM, OLMo or DataDecide - have a look HF:
Tweet card summary image
huggingface.co
1
0
1
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
We release open-sci-ref-0.01 - a research reference base model family trained on 8 open datasets (CommonCorpus, Pile, C4, SlimPajama, HPLT-2.0, FineWeb-Edu-1.4T, DCLM-baseline, Nemotron-CC-HQ), up to 1.7B model and 1T tokens scale. https://t.co/ZFDDzHMJN7
1
3
13
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
The manifest reads great, this is the right spirit, and we need same not only for startup/founders culture, but also for academia/basic research. Again, an example for EU how to proceed to come out of current misery. https://t.co/AYSOAsrcXn
0
0
0
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
This is awesome. Hope such spots will pop up more and more all around the world. Eg EU would do well to support such entities with all it has, instead of pumping funds into mostly old, dysfunctional structures which just prevent everything inspiring and novel.
@Thom_Wolf
Thomas Wolf
2 months
I'm just back from two days in this totally crazy place in Finland called @shipfr8 It's a former hotel where they now bring cracked teenagers from all over the world (big part from the US, but also Europe, Asia) to spend 3 months 100% focused on an idea they want to build. The
1
1
3
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
Very nice work - re-processing CC math and code links from existing open datasets with a better parser rendering HTML with Lynx and converting to LaTeX via LLM. Impressive 133B scale and clear gains on math and code, good to have it as open dataset to experiment with.
@KarimiRabeeh
Rabeeh Karimi
2 months
We just released Nemotron-CC-Math ๐Ÿš€ Equations on web arenโ€™t just LaTeX-theyโ€™re in MathML,<pre> tags,inline,even images.Code shows up just as many ways. Most parsers drop it. Nemotron-CC-Math(133B tokens) reprocesses CommonCrawl math pages to capture math equations +code reliably
0
0
3
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
Game Reasoning Arena - a collaborative LAION project using various game play scenarios to study reasoning in language models.
@LuciaCKun
Lucia Cipolina Kun
3 months
๐Ÿ“ท New research: How do LLMs think strategically? We built Game Reasoning Arena to find out. LLMs play strategic games while we capture their reasoning traces. Paper:
0
0
4
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
ii) No evals as encoder in VLMs, like in Cambrian or OpenVision. Dino v2 was not doing good as component in VLMs, Dino v3 comparison would be insightful, as it is important to know model behavior as encoder for generalist vision-language learning, not only as standalone.
0
0
3
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
2 months
It is good to see progress in SSL on images only via Dino v3. I think there are some unfortunate bits in training/evaluation that make it harder to see where model stands. i) Dino v3 uses ImageNet-1k, 22k for pretraining, doing then part of evals again on ImageNet-1k 1/2
1
0
3
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
3 months
Debunking yet another study of many that claim benefit of "brain-inspired" mechanisms without doing proper controls - comparing to reference transformer of same size. Doing reference comparisons is the way to avoid "brain-inspired" further sliding towards red flag for scams.
@arcprize
ARC Prize
3 months
Finding #1: The "hierarchical" architecture had minimal performance impact when compared to a similarly sized transformer A drop-in transformer comes within a few points without any hyperparameter optimization. See our full post: https://t.co/PXDvDY020D
5
8
82
@JJitsev
Jenia Jitsev ๐Ÿณ๏ธโ€๐ŸŒˆ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฎ๐Ÿ‡ฑ
3 months
Bibi files, must watch. Israel is a bunch which survived through worse things than Bibi. Would he landed up in jail years before as he was supposed to, many things would have been probably much easier. https://t.co/Y8KkCd4mEj
1
0
0