N1colAIs Profile Banner
Nicolas Boizard Profile
Nicolas Boizard

@N1colAIs

Followers
166
Following
156
Media
16
Statuses
86

NLP - AI | 25y PhD student @CentraleSupelec | EuroBERT 🇪🇺 & Universal Logit Distillation Loss ⚗️ & CroissantLLM 🥐

Paris, France
Joined December 2023
Don't wanna be here? Send us removal request.
@N1colAIs
Nicolas Boizard
4 months
🇪🇺 One month after the AI Action Summit 2025 in Paris, I am thrilled to announce EuroBERT, a family of multilingual encoder exhibiting the strongest multilingual performance for task such as retrieval, classification and regression over 15 languages, mathematics and code. ⬇️ 1/6
Tweet media one
15
48
189
@N1colAIs
Nicolas Boizard
9 days
EuroBERT is going to @COLM_conf 2025! Can’t wait to be in Montreal with @gisship and @DuarteMRAlves to see all the great research everyone’s bringing!
Tweet media one
0
4
22
@N1colAIs
Nicolas Boizard
14 days
RT @bclavie: This is excellent science confirming the common wisdom that bidirectional attention is superior for encoder tasks, but that ML….
0
2
0
@N1colAIs
Nicolas Boizard
15 days
RT @ManuelFaysse: 🚨Should We Still Pretrain Encoders with Masked Language Modeling? We have recently seen massively trained causal decoders….
0
37
0
@N1colAIs
Nicolas Boizard
16 days
RT @gisship: 🚨 New paper drop: Should We Still Pretrain Encoders with Masked Language Modeling?.We revisit a foundational question in NLP:….
0
4
0
@N1colAIs
Nicolas Boizard
16 days
More in the paper, go check it out! You can also check out our blog post: Huge thanks to the dream team 💥.@gisship , @ManuelFaysse , @DuarteMRAlves , Emmanuel Malherbe, @andre_t_martins , Céline Hudelot, @PierreColombo6 🙌. 7/7 🧵.
0
0
0
@N1colAIs
Nicolas Boizard
16 days
Finally, we show that CLM is more data-efficient — it converges faster early in training — which can be extremely useful in low-resource domains and provides greater stability during downstream fine-tuning. 6/7 🧵
Tweet media one
Tweet media two
1
0
0
@N1colAIs
Nicolas Boizard
16 days
Motivated by strong hybrid training results, we explore a continuous pretraining (CPT) setup. MLM-adapted CLM models consistently outperform. Excitingly, starting from a SOTA decoder can yield even better encoders—unlocking the best of both worlds. 5/7 🧵
Tweet media one
1
0
0
@N1colAIs
Nicolas Boizard
16 days
First, we study the effect of different pretraining strategies that combine causal language modeling (CLM) and masked language modeling (MLM) objectives. Starting with CLM-only and transitioning to MLM-only setups, consistently outperform pure MLM across downstream tasks. 4/7 🧵
Tweet media one
1
0
0
@N1colAIs
Nicolas Boizard
16 days
We ran a controlled study with identical model sizes, data, and diverse tasks across two scenarios: pretraining from scratch & continued pretraining. With 30+ models and 15k finetuning runs, results show MLM alone isn’t optimal and starting with CLM boosts downstream perf.3/7 🧵
Tweet media one
1
0
0
@N1colAIs
Nicolas Boizard
16 days
While encoders usually rely on MLM pretraining, recent work shows decoders trained with causal LM (CLM) can also work well as encoder models. We investigate if CLM’s benefits come from the objective itself or confounding factors like model size and training scale. 2/7 🧵.
1
0
0
@N1colAIs
Nicolas Boizard
16 days
🚨 Should you only pretrain encoder models with Masked Language Modeling (MLM)?. Spoiler: definitely not!. Let’s revisit a foundational NLP question: Is MLM still the best way to pretrain encoder models for text representations?. 📄: x @gisship. 1/7 🧵.
1
2
3
@N1colAIs
Nicolas Boizard
3 months
So nice to see new distillation libraries emerging around the ULD Loss and other cross-tokenizer distillation methods. Big kudos to @vitransformer and the whole DistillKitPlus team for their work.
@vitransformer
Vision Transformers
3 months
its been an exciting few weeks finding my footing @lossfunk 🥳 and working on DistillKitPlus (with @AmanGokrani ), a toolkit for making smaller models learn from larger ones through efficient logits distillation. (check it out @ . some recent updates:.
0
0
2
@N1colAIs
Nicolas Boizard
4 months
• Optimized: Fused ops (Liger Kernel, Flash), distributed training (FSDP, DDP). • Data Processing: Tokenization, packing, subsampling. • Highly Customisable and more. For any large-scale continuous pre-training feel free to contact me or @DuarteMRAlves for extensive support 🤗.
0
0
1
@N1colAIs
Nicolas Boizard
4 months
Why EuroBERT training might interest you:.• Resumable: Continue training with different hardware & environments. • Hardware Agnostic: Runs on CPU, @AMD, or @nvidia GPUs. • Hugging Face Friendly: Model, tokenizer.
1
0
1
@N1colAIs
Nicolas Boizard
4 months
The EuroBERT training library is live! 🚀.Additionally, as weekends are perfect for experimentation, we’ve released a tutorial on continuous pre-training to add languages to EuroBERT. 🎓Tutorial: 🔨Github:
1
1
7
@N1colAIs
Nicolas Boizard
4 months
RT @antoine_chaffin: PyLate 1.1.7 - The big batches update - is out! 🚀.As contrastive learning remains the go-to for large-scale pre-traini….
0
13
0
@N1colAIs
Nicolas Boizard
4 months
@Engomar_10 *congratulations: @Engomar_10.
0
1
2
@N1colAIs
Nicolas Boizard
4 months
Great to see the community using EuroBERT! As hoped, it’s proving to be an excellent foundation model, especially for information retrieval tasks across multiple languages after just one epoch of finetuning. Check it out: .@Engomar_10
Tweet media one
1
1
5