b_alastruey Profile Banner
Belen Alastruey Profile
Belen Alastruey

@b_alastruey

Followers
768
Following
243
Media
25
Statuses
95

PhD student @AIatMeta & @PSL_univ. Previously: @amazon Alexa, @apple MT, @mtupc1

Barcelona, España
Joined November 2021
Don't wanna be here? Send us removal request.
@b_alastruey
Belen Alastruey
11 days
🚀New paper alert! 🚀. In our work @AIatMeta we dive into the struggles of mixing languages in largely multilingual Transformer encoders and use the analysis as a tool to better design multilingual models to obtain optimal performance. 📄: 🧵(1/n)
Tweet media one
1
17
73
@b_alastruey
Belen Alastruey
11 days
TL;DR: The Interference Matrix offers a data-driven approach to optimize multilingual model training, and shows that language interference is asymmetrical and depends on script, not on family or embedding similarity. Thanks to all coauthors!.📄: 🧵(n/n)
Tweet media one
0
0
5
@b_alastruey
Belen Alastruey
11 days
Furthermore, adding an “unfriendly” language to a group can degrade all other languages’ performance. 🧵(8/n)
Tweet media one
1
0
3
@b_alastruey
Belen Alastruey
11 days
How does this impact downstream performance?🤔.The Interference Matrix isn’t just analytical, it predicts real downstream performance drops. Models trained on high-interference pairs/groups showed less accuracy on classification benchmarks vs. low-interference ones. 🧵(7/n)
Tweet media one
Tweet media two
1
0
3
@b_alastruey
Belen Alastruey
11 days
What about languages close in aligned embedding spaces? We compared interference scores with similarity in MEXMA and SONAR embedding spaces and found no correlation. Greater similarity does not  mean less interference. 🧵(6/n)
Tweet media one
1
0
3
@b_alastruey
Belen Alastruey
11 days
We analyze the behavior of low- and high-resource languages and find that low-resource languages not only suffer more from interference (low robustness) but can also impact more the performance of other languages trained alongside them (low friendliness).📉. 🧵(5/n)
Tweet media one
1
0
3
@b_alastruey
Belen Alastruey
11 days
Another common assumption is that interference between two languages applies equally in both directions. 🔎However, we find that cross-lingual interference is asymmetric! 🇪🇸➡️🇵🇹 can cause a big performance drop while 🇵🇹➡️🇪🇸 may not!🔎. 🧵(4/n).
1
0
3
@b_alastruey
Belen Alastruey
11 days
Researchers often assume that languages within the same family interfere less with each other than languages from different families. 🔎However, we found that interference is not so affected by language families; but script appears to be a more influential factor!🔎. 🧵(3/n)
Tweet media one
1
0
4
@b_alastruey
Belen Alastruey
11 days
Wondering what happens inside a multilingual Transformer?🤔We created an Interference Matrix to measure how languages impact each other's performance in a bilingual encoder and use it to guide the design of stronger multilingual models! 🚀. 🧵(2/n)
Tweet media one
1
0
3
@b_alastruey
Belen Alastruey
21 days
RT @JoaoMJaneiro: If you are attending ACL2025 join our oral presentation! Happening at 15:00 in room 1.86 🙂.
0
3
0
@b_alastruey
Belen Alastruey
6 months
RT @eduardosg_ai: Happy to see that Linguini, our benchmark for language-agnostic linguistic reasoning, has been included in DeepMind’s BIG….
0
3
0
@b_alastruey
Belen Alastruey
8 months
Happy to share our team's work on Large Concept Models (LCMs), a new approach for language modeling that goes beyond standard token-based LLMs by operating in a multilingual and multimodal embedding space. Check out the full paper!. 📄:
Tweet media one
12
85
448
@b_alastruey
Belen Alastruey
11 months
TL;DR: We conduct the first-ever analysis of the training dynamics of ST systems. Based on its results, we adjust the Transformer architecture to enhance the performance while bypassing the pretraining stage. We're thrilled to share these findings at #EMNLP! See you in Miami! 🏖️
Tweet media one
0
0
1
@b_alastruey
Belen Alastruey
11 months
Results?📈 . Our modified model, trained from scratch, not only performs comparably to its pretrained counterpart, but also reduces training time significantly by skipping the pretraining. 🧵(5/n)
Tweet media one
1
0
1
@b_alastruey
Belen Alastruey
11 months
To assess our findings, we propose a small tweak in the decoder's cross-attention mechanism. This change forces the model to integrate source information earlier in the training process, hence causing a faster training of the encoder. 🧵(4/n)
Tweet media one
1
0
1
@b_alastruey
Belen Alastruey
11 months
Why does this happen?🤔 . Models focus on language modeling to bypass the encoder information until the encoder is adequately trained. This process is quick in MT, non-existent when using a pre-trained encoder in ST, and lengthy when training an ST system from scratch. 🧵(3/n).
1
0
1
@b_alastruey
Belen Alastruey
11 months
We study how Transformers use the speech source data during training in two approaches: using a pretrained encoder vs. training from scratch. Our findings show that models trained from scratch struggle to effectively use speech inputs for predictions early in training. 🧵(2/n)
Tweet media one
1
0
1
@b_alastruey
Belen Alastruey
11 months
🚨New #EMNLP Main paper🚨. What is the impact of ASR pretraining in Direct Speech Translation models?🤔. In our work we use interpretability to find out, and we use the findings to skip the pretaining!🔎📈. w/@geiongallego @costajussamarta. 📄: 🧵(1/n)
Tweet media one
1
4
58
@b_alastruey
Belen Alastruey
11 months
RT @JoaoMJaneiro: Last week we released the first paper of my PhD, "MEXMA: Token-level objectives improve sentence representations". We….
0
8
0
@b_alastruey
Belen Alastruey
11 months
Happy to share Linguini🍝, a benchmark to evaluate linguistic reasoning in LLMs without relying on prior language-specific knowledge. We show the task is still hard for SOTA models, achieving below 25% accuracy. 📄:
Tweet media one
@eduardosg_ai
Eduardo Sánchez
11 months
🚨NEW BENCHMARK🚨. Are LLMs good at linguistic reasoning if we minimize the chance of prior language memorization?. We introduce Linguini🍝, a benchmark for linguistic reasoning in which SOTA models perform below 25%. w/ @b_alastruey, @artetxem, @costajussamarta et al. 🧵(1/n)
Tweet media one
0
3
52