Duarte Alves Profile
Duarte Alves

@DuarteMRAlves

Followers
56
Following
10
Media
2
Statuses
34

Joined November 2023
Don't wanna be here? Send us removal request.
@andre_t_martins
Andre Martins
1 month
We are #hiring for Post-doc, PhD, and Research Engineer positions in SARDINE Lab, University of Lisbon. Message me if you're interested in joining our team. We are attending Conference on Language Modeling if you would like to meet! - via #Whova event app
3
76
263
@andre_t_martins
Andre Martins
1 month
I'm heading soon to Montreal for @COLM_conf ! Our lab is presenting the following 5 papers: ๐Ÿงต
1
6
34
@andre_t_martins
Andre Martins
1 month
5) EuroBERT: Scaling Multilingual Encoders for European Languages w/ @N1colAIs @gisship @DuarteMRAlves @AyoubHammal @UndefBehavior @Fannyjrd_ @ManuelFaysse @peyrardMax @psanfernandes @RicardoRei7 @PierreColombo6 @tomaarsen - Poster session 5, Thu Oct 9, 11:00 AM โ€“ 1:00 PM
1
6
8
@N1colAIs
Nicolas Boizard
4 months
EuroBERT is going to @COLM_conf 2025! Canโ€™t wait to be in Montreal with @gisship and @DuarteMRAlves to see all the great research everyoneโ€™s bringing!
0
4
21
@N1colAIs
Nicolas Boizard
5 months
๐Ÿšจ Should you only pretrain encoder models with Masked Language Modeling (MLM)? Spoiler: definitely not! Letโ€™s revisit a foundational NLP question: Is MLM still the best way to pretrain encoder models for text representations? ๐Ÿ“„: https://t.co/kaPLch1o3V x @gisship 1/7 ๐Ÿงต
Tweet card summary image
arxiv.org
Learning high-quality text representations is fundamental to a wide range of NLP tasks. While encoder pretraining has traditionally relied on Masked Language Modeling (MLM), recent evidence...
1
2
3
@gisship
Hippolyte Gisserot-Boukhlef
5 months
๐Ÿšจ New paper drop: Should We Still Pretrain Encoders with Masked Language Modeling? We revisit a foundational question in NLP: Is masked language modeling (MLM) still the best way to pretrain encoder models for text representations? ๐Ÿ“„ https://t.co/W1p5mjTTf2 (1/8)
Tweet card summary image
arxiv.org
Learning high-quality text representations is fundamental to a wide range of NLP tasks. While encoder pretraining has traditionally relied on Masked Language Modeling (MLM), recent evidence...
1
4
25
@UTTERProject
UTTER
5 months
๐Ÿš€ Proud moment! Prof. @andre_t_martins represented @UTTERProject & #EuroLLM at #GTCParis + #VivaTech2025, showcasing their role in Europeโ€™s sovereign AI future. And the highlight? Both projects were featured in Jensen Huangโ€™s keynote! ๐Ÿ™Œ #EU #NVIDIA #LLMs #AIResearch
0
3
3
@N1colAIs
Nicolas Boizard
8 months
The EuroBERT training library is live! ๐Ÿš€ Additionally, as weekends are perfect for experimentation, weโ€™ve released a tutorial on continuous pre-training to add languages to EuroBERT. ๐ŸŽ“Tutorial: https://t.co/nMleTzF7A7 ๐Ÿ”จGithub:
github.com
Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including CPU, AMD, and NVIDIA GPUs. - Nicolas-BZRD/EuroBERT
1
1
7
@tomaarsen
tomaarsen
8 months
An assembly of 18 European companies, labs, and universities have banded together to launch ๐Ÿ‡ช๐Ÿ‡บ EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc. Details in ๐Ÿงต
4
16
92
@N1colAIs
Nicolas Boizard
8 months
๐Ÿ‡ช๐Ÿ‡บ One month after the AI Action Summit 2025 in Paris, I am thrilled to announce EuroBERT, a family of multilingual encoder exhibiting the strongest multilingual performance for task such as retrieval, classification and regression over 15 languages, mathematics and code. โฌ‡๏ธ 1/6
15
46
186
@DuarteMRAlves
Duarte Alves
8 months
๐Ÿงต (6/7) ๐Ÿ™ Huge thanks also to all our collaborators: @CentraleSupelec @Diabolocom @artefact @sardine_lab_it @istecnico @itnewspt @Lisbon_ELLIS @Unbabel @AMD @CINESFrance
1
0
5
@DuarteMRAlves
Duarte Alves
8 months
๐Ÿงต (5/7) @N1colAIs @gisship @andre_t_martins @AyoubHammal @UndefBehavior Cรฉline Hudelot, Emmanuel Malherbe, Etienne Malaboeuf @Fannyjrd_ Gabriel Hautreux @joao97_alves Kevin El-Haddad @ManuelFaysse @peyrardMax Nuno M. Guerreiro @psanfernandes @RicardoRei7 @PierreColombo6
1
1
8
@DuarteMRAlves
Duarte Alves
8 months
๐Ÿงต (4/7) ๐Ÿค This work is the result of an incredible joint effort by a talented team from multiple institutions, props to everyone!
1
0
3
@DuarteMRAlves
Duarte Alves
8 months
๐Ÿงต (3/7) ๐ŸŒ EuroBERT is open-source: ๐Ÿ‘‰ Models (210M, 610M, 2.1B params) ๐Ÿ‘‰ Training snapshots ๐Ÿ‘‰ Full training framework Explore here: [ https://t.co/SZHKDordRg](https://t.co/SZHKDordRg) Code coming soon! [ https://t.co/7o8CpqOfRV](https://t.co/7o8CpqOfRV)
github.com
Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including CPU, AMD, and NVIDIA GPUs. - Nicolas-BZRD/EuroBERT
1
2
7
@DuarteMRAlves
Duarte Alves
8 months
๐Ÿงต (2/7) ๐Ÿ“Š EuroBERT shines across benchmarks: โœ”๏ธ Retrieval (MIRACL, MLDR) โœ”๏ธ Classification (XNLI, PAWS-X) โœ”๏ธ Regression (SeaHorse) โœ”๏ธ Strong in code/math understanding (CodeSearchNet)
1
0
4
@DuarteMRAlves
Duarte Alves
8 months
๐Ÿงต (1/7) ๐Ÿ“š Why EuroBERT? โœ… Extensive multilingual coverage โœ… Longer context handling (up to 8,192 tokens) โœ… Improved architecture โœ… Specialized for math and coding Ideal for retrieval, classification, and regression tasks!
1
2
5
@DuarteMRAlves
Duarte Alves
8 months
๐Ÿš€ Excited to announce EuroBERT: a new multilingual encoder model family for European & global languages! ๐ŸŒ ๐Ÿ”น EuroBERT is trained on a massive 5 trillion-token dataset across 15 languages and includes recent architecture advances such as GQA, RoPE & RMSNorm.ย ๐Ÿ”น
1
12
59
@andre_t_martins
Andre Martins
9 months
Good to see @EU_Commission promoting OS LLMs in Europe. However (1) "OpenEuroLLM" is appropriating a name (#EuroLLM) which already exists, (2) it is certainly *not* the "first family of open-source LLMs covering all EU languages" ๐Ÿงต
@EU_Commission
European Commission
9 months
AI made in ๐Ÿ‡ช๐Ÿ‡บ OpenEuroLLM, the first family of open source Large Language Models covering all EU languages, has earned the first STEP Seal for its excellence. It brings together EU startups, research labs and supercomputing hosts to train AI on European supercomputers โ†“
2
13
47
@PierreColombo6
Pierre Colombo
11 months
What an incredible year for the team @ManuelFaysse @nunonmg @gisship @N1colAIs @DuarteMRAlves @andre_t_martins @UndefBehavior! The retrospective from @ManuelFaysse captures some. Plus, there's plenty of exciting news from @equallaiโ€”so much to celebrate and be proud of! ๐ŸŽ‰
@ManuelFaysse
Manuel Faysse
11 months
2024 was a super active year where I had the chance to explore many things: document embeddings, LLM pretraining, VLMs, ML Privacy... It's also the year of my first citation - and soon my 100th ?! A thread where I quickly go over some of my work from the year (1/N) ๐Ÿงต
0
4
13