Nicolas Boizard @N1colAIs X Profile

Nicolas Boizard

@N1colAIs

Followers

166

Following

156

Media

16

Statuses

86

NLP - AI | 25y PhD student @CentraleSupelec | EuroBERT 🇪🇺 & Universal Logit Distillation Loss ⚗️ & CroissantLLM 🥐

Paris, France

Joined December 2023

Don't wanna be here? Send us removal request.

Nicolas Boizard

@N1colAIs

4 months

🇪🇺 One month after the AI Action Summit 2025 in Paris, I am thrilled to announce EuroBERT, a family of multilingual encoder exhibiting the strongest multilingual performance for task such as retrieval, classification and regression over 15 languages, mathematics and code. ⬇️ 1/6

15

48

189

Nicolas Boizard

@N1colAIs

9 days

EuroBERT is going to @COLM_conf 2025! Can’t wait to be in Montreal with @gisship and @DuarteMRAlves to see all the great research everyone’s bringing!

0

4

22

Nicolas Boizard

@N1colAIs

14 days

RT @bclavie: This is excellent science confirming the common wisdom that bidirectional attention is superior for encoder tasks, but that ML….

0

2

0

Nicolas Boizard

@N1colAIs

15 days

RT @ManuelFaysse: 🚨Should We Still Pretrain Encoders with Masked Language Modeling? We have recently seen massively trained causal decoders….

0

37

0

Nicolas Boizard

@N1colAIs

16 days

RT @gisship: 🚨 New paper drop: Should We Still Pretrain Encoders with Masked Language Modeling?.We revisit a foundational question in NLP:….

0

4

0

Nicolas Boizard

@N1colAIs

16 days

More in the paper, go check it out! You can also check out our blog post: Huge thanks to the dream team 💥.@gisship , @ManuelFaysse , @DuarteMRAlves , Emmanuel Malherbe, @andre_t_martins , Céline Hudelot, @PierreColombo6 🙌. 7/7 🧵.

0

Nicolas Boizard

@N1colAIs

16 days

Finally, we show that CLM is more data-efficient — it converges faster early in training — which can be extremely useful in low-resource domains and provides greater stability during downstream fine-tuning. 6/7 🧵

1

0

Nicolas Boizard

@N1colAIs

16 days

Motivated by strong hybrid training results, we explore a continuous pretraining (CPT) setup. MLM-adapted CLM models consistently outperform. Excitingly, starting from a SOTA decoder can yield even better encoders—unlocking the best of both worlds. 5/7 🧵

1

0

Nicolas Boizard

@N1colAIs

16 days

First, we study the effect of different pretraining strategies that combine causal language modeling (CLM) and masked language modeling (MLM) objectives. Starting with CLM-only and transitioning to MLM-only setups, consistently outperform pure MLM across downstream tasks. 4/7 🧵

1

0

Nicolas Boizard

@N1colAIs

16 days

We ran a controlled study with identical model sizes, data, and diverse tasks across two scenarios: pretraining from scratch & continued pretraining. With 30+ models and 15k finetuning runs, results show MLM alone isn’t optimal and starting with CLM boosts downstream perf.3/7 🧵

1

0

Nicolas Boizard

@N1colAIs

16 days

While encoders usually rely on MLM pretraining, recent work shows decoders trained with causal LM (CLM) can also work well as encoder models. We investigate if CLM’s benefits come from the objective itself or confounding factors like model size and training scale. 2/7 🧵.

1

0

Nicolas Boizard

@N1colAIs

16 days

🚨 Should you only pretrain encoder models with Masked Language Modeling (MLM)?. Spoiler: definitely not!. Let’s revisit a foundational NLP question: Is MLM still the best way to pretrain encoder models for text representations?. 📄: x @gisship. 1/7 🧵.

1

2

3

Nicolas Boizard

@N1colAIs

3 months

So nice to see new distillation libraries emerging around the ULD Loss and other cross-tokenizer distillation methods. Big kudos to @vitransformer and the whole DistillKitPlus team for their work.

Vision Transformers

@vitransformer

3 months

its been an exciting few weeks finding my footing @lossfunk 🥳 and working on DistillKitPlus (with @AmanGokrani ), a toolkit for making smaller models learn from larger ones through efficient logits distillation. (check it out @ . some recent updates:.

0

2

Nicolas Boizard

@N1colAIs

4 months

• Optimized: Fused ops (Liger Kernel, Flash), distributed training (FSDP, DDP). • Data Processing: Tokenization, packing, subsampling. • Highly Customisable and more. For any large-scale continuous pre-training feel free to contact me or @DuarteMRAlves for extensive support 🤗.

0

1

Nicolas Boizard

@N1colAIs

4 months

Why EuroBERT training might interest you:.• Resumable: Continue training with different hardware & environments. • Hardware Agnostic: Runs on CPU, @AMD, or @nvidia GPUs. • Hugging Face Friendly: Model, tokenizer.

1

0

1

Nicolas Boizard

@N1colAIs

4 months

The EuroBERT training library is live! 🚀.Additionally, as weekends are perfect for experimentation, we’ve released a tutorial on continuous pre-training to add languages to EuroBERT. 🎓Tutorial: 🔨Github:

1

7

Nicolas Boizard

@N1colAIs

4 months

RT @antoine_chaffin: PyLate 1.1.7 - The big batches update - is out! 🚀.As contrastive learning remains the go-to for large-scale pre-traini….

0

13

0

Nicolas Boizard

@N1colAIs

4 months

@Engomar_10 *congratulations: @Engomar_10.

0

1

2

Nicolas Boizard

@N1colAIs

4 months

Great to see the community using EuroBERT! As hoped, it’s proving to be an excellent foundation model, especially for information retrieval tasks across multiple languages after just one epoch of finetuning. Check it out: .@Engomar_10

1

5