Duarte Alves @DuarteMRAlves X Profile

Duarte Alves

@DuarteMRAlves

Followers

56

Following

10

Media

2

Statuses

34

Joined November 2023

Don't wanna be here? Send us removal request.

Andre Martins

@andre_t_martins

1 month

We are #hiring for Post-doc, PhD, and Research Engineer positions in SARDINE Lab, University of Lisbon. Message me if you're interested in joining our team. We are attending Conference on Language Modeling if you would like to meet! - via #Whova event app

3

76

263

Andre Martins

@andre_t_martins

1 month

I'm heading soon to Montreal for @COLM_conf ! Our lab is presenting the following 5 papers: 🧵

1

6

34

Andre Martins

@andre_t_martins

1 month

5) EuroBERT: Scaling Multilingual Encoders for European Languages w/ @N1colAIs @gisship @DuarteMRAlves @AyoubHammal @UndefBehavior @Fannyjrd_ @ManuelFaysse @peyrardMax @psanfernandes @RicardoRei7 @PierreColombo6 @tomaarsen - Poster session 5, Thu Oct 9, 11:00 AM – 1:00 PM

1

6

8

Nicolas Boizard

@N1colAIs

4 months

EuroBERT is going to @COLM_conf 2025! Can’t wait to be in Montreal with @gisship and @DuarteMRAlves to see all the great research everyone’s bringing!

0

4

21

Nicolas Boizard

@N1colAIs

5 months

🚨 Should you only pretrain encoder models with Masked Language Modeling (MLM)? Spoiler: definitely not! Let’s revisit a foundational NLP question: Is MLM still the best way to pretrain encoder models for text representations? 📄: https://t.co/kaPLch1o3V x @gisship 1/7 🧵

arxiv.org

Learning high-quality text representations is fundamental to a wide range of NLP tasks. While encoder pretraining has traditionally relied on Masked Language Modeling (MLM), recent evidence...

1

2

3

Hippolyte Gisserot-Boukhlef

@gisship

5 months

🚨 New paper drop: Should We Still Pretrain Encoders with Masked Language Modeling? We revisit a foundational question in NLP: Is masked language modeling (MLM) still the best way to pretrain encoder models for text representations? 📄 https://t.co/W1p5mjTTf2 (1/8)

arxiv.org

Learning high-quality text representations is fundamental to a wide range of NLP tasks. While encoder pretraining has traditionally relied on Masked Language Modeling (MLM), recent evidence...

1

4

25

UTTER

@UTTERProject

5 months

🚀 Proud moment! Prof. @andre_t_martins represented @UTTERProject & #EuroLLM at #GTCParis + #VivaTech2025, showcasing their role in Europe’s sovereign AI future. And the highlight? Both projects were featured in Jensen Huang’s keynote! 🙌 #EU #NVIDIA #LLMs #AIResearch

0

3

Nicolas Boizard

@N1colAIs

8 months

The EuroBERT training library is live! 🚀 Additionally, as weekends are perfect for experimentation, we’ve released a tutorial on continuous pre-training to add languages to EuroBERT. 🎓Tutorial: https://t.co/nMleTzF7A7 🔨Github:

github.com

Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including CPU, AMD, and NVIDIA GPUs. - Nicolas-BZRD/EuroBERT

1

7

tomaarsen

@tomaarsen

8 months

An assembly of 18 European companies, labs, and universities have banded together to launch 🇪🇺 EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc. Details in 🧵

4

16

92

Nicolas Boizard

@N1colAIs

8 months

🇪🇺 One month after the AI Action Summit 2025 in Paris, I am thrilled to announce EuroBERT, a family of multilingual encoder exhibiting the strongest multilingual performance for task such as retrieval, classification and regression over 15 languages, mathematics and code. ⬇️ 1/6

15

46

186

Duarte Alves

@DuarteMRAlves

8 months

🧵 (7/7) 📖 Check out our blog post for more insights: https://t.co/7oe2ZPdQtB 📄 Read more in our paper:

arxiv.org

General-purpose multilingual vector representations, used in retrieval, regression and classification, are traditionally obtained from bidirectional encoder models. Despite their wide...

0

1

6

Duarte Alves

@DuarteMRAlves

8 months

🧵 (6/7) 🙏 Huge thanks also to all our collaborators: @CentraleSupelec @Diabolocom @artefact @sardine_lab_it @istecnico @itnewspt @Lisbon_ELLIS @Unbabel @AMD @CINESFrance

1

0

5

Duarte Alves

@DuarteMRAlves

8 months

🧵 (5/7) @N1colAIs @gisship @andre_t_martins @AyoubHammal @UndefBehavior Céline Hudelot, Emmanuel Malherbe, Etienne Malaboeuf @Fannyjrd_ Gabriel Hautreux @joao97_alves Kevin El-Haddad @ManuelFaysse @peyrardMax Nuno M. Guerreiro @psanfernandes @RicardoRei7 @PierreColombo6

1

8

Duarte Alves

@DuarteMRAlves

8 months

🧵 (4/7) 🤝 This work is the result of an incredible joint effort by a talented team from multiple institutions, props to everyone!

1

0

3

Duarte Alves

@DuarteMRAlves

8 months

🧵 (3/7) 🌐 EuroBERT is open-source: 👉 Models (210M, 610M, 2.1B params) 👉 Training snapshots 👉 Full training framework Explore here: [ https://t.co/SZHKDordRg](https://t.co/SZHKDordRg) Code coming soon! [ https://t.co/7o8CpqOfRV](https://t.co/7o8CpqOfRV)

github.com

Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including CPU, AMD, and NVIDIA GPUs. - Nicolas-BZRD/EuroBERT

1

2

7

Duarte Alves

@DuarteMRAlves

8 months

🧵 (2/7) 📊 EuroBERT shines across benchmarks: ✔️ Retrieval (MIRACL, MLDR) ✔️ Classification (XNLI, PAWS-X) ✔️ Regression (SeaHorse) ✔️ Strong in code/math understanding (CodeSearchNet)

1

0

4

Duarte Alves

@DuarteMRAlves

8 months

🧵 (1/7) 📚 Why EuroBERT? ✅ Extensive multilingual coverage ✅ Longer context handling (up to 8,192 tokens) ✅ Improved architecture ✅ Specialized for math and coding Ideal for retrieval, classification, and regression tasks!

1

2

5

Duarte Alves

@DuarteMRAlves

8 months

🚀 Excited to announce EuroBERT: a new multilingual encoder model family for European & global languages! 🌍 🔹 EuroBERT is trained on a massive 5 trillion-token dataset across 15 languages and includes recent architecture advances such as GQA, RoPE & RMSNorm. 🔹

1

12

59

Andre Martins

@andre_t_martins

9 months

Good to see @EU_Commission promoting OS LLMs in Europe. However (1) "OpenEuroLLM" is appropriating a name (#EuroLLM) which already exists, (2) it is certainly *not* the "first family of open-source LLMs covering all EU languages" 🧵

European Commission

@EU_Commission

9 months

AI made in 🇪🇺 OpenEuroLLM, the first family of open source Large Language Models covering all EU languages, has earned the first STEP Seal for its excellence. It brings together EU startups, research labs and supercomputing hosts to train AI on European supercomputers ↓

2

13

47

Pierre Colombo

@PierreColombo6

11 months

What an incredible year for the team @ManuelFaysse @nunonmg @gisship @N1colAIs @DuarteMRAlves @andre_t_martins @UndefBehavior! The retrospective from @ManuelFaysse captures some. Plus, there's plenty of exciting news from @equallai—so much to celebrate and be proud of! 🎉

Manuel Faysse

@ManuelFaysse

11 months

2024 was a super active year where I had the chance to explore many things: document embeddings, LLM pretraining, VLMs, ML Privacy... It's also the year of my first citation - and soon my 100th ?! A thread where I quickly go over some of my work from the year (1/N) 🧵

0

4

13