Belen Alastruey @b_alastruey X Profile

Belen Alastruey

@b_alastruey

Followers

768

Following

243

Media

25

Statuses

95

PhD student @AIatMeta & @PSL_univ. Previously: @amazon Alexa, @apple MT, @mtupc1

Barcelona, España

Joined November 2021

Don't wanna be here? Send us removal request.

Belen Alastruey

@b_alastruey

11 days

🚀New paper alert! 🚀. In our work @AIatMeta we dive into the struggles of mixing languages in largely multilingual Transformer encoders and use the analysis as a tool to better design multilingual models to obtain optimal performance. 📄: 🧵(1/n)

1

17

73

Belen Alastruey

@b_alastruey

11 days

TL;DR: The Interference Matrix offers a data-driven approach to optimize multilingual model training, and shows that language interference is asymmetrical and depends on script, not on family or embedding similarity. Thanks to all coauthors!.📄: 🧵(n/n)

0

5

Belen Alastruey

@b_alastruey

11 days

Furthermore, adding an “unfriendly” language to a group can degrade all other languages’ performance. 🧵(8/n)

1

0

3

Belen Alastruey

@b_alastruey

11 days

How does this impact downstream performance?🤔.The Interference Matrix isn’t just analytical, it predicts real downstream performance drops. Models trained on high-interference pairs/groups showed less accuracy on classification benchmarks vs. low-interference ones. 🧵(7/n)

1

0

3

Belen Alastruey

@b_alastruey

11 days

What about languages close in aligned embedding spaces? We compared interference scores with similarity in MEXMA and SONAR embedding spaces and found no correlation. Greater similarity does not mean less interference. 🧵(6/n)

1

0

3

Belen Alastruey

@b_alastruey

11 days

We analyze the behavior of low- and high-resource languages and find that low-resource languages not only suffer more from interference (low robustness) but can also impact more the performance of other languages trained alongside them (low friendliness).📉. 🧵(5/n)

1

0

3

Belen Alastruey

@b_alastruey

11 days

Another common assumption is that interference between two languages applies equally in both directions. 🔎However, we find that cross-lingual interference is asymmetric! 🇪🇸➡️🇵🇹 can cause a big performance drop while 🇵🇹➡️🇪🇸 may not!🔎. 🧵(4/n).

1

0

3

Belen Alastruey

@b_alastruey

11 days

Researchers often assume that languages within the same family interfere less with each other than languages from different families. 🔎However, we found that interference is not so affected by language families; but script appears to be a more influential factor!🔎. 🧵(3/n)

1

0

4

Belen Alastruey

@b_alastruey

11 days

Wondering what happens inside a multilingual Transformer?🤔We created an Interference Matrix to measure how languages impact each other's performance in a bilingual encoder and use it to guide the design of stronger multilingual models! 🚀. 🧵(2/n)

1

0

3

Belen Alastruey

@b_alastruey

21 days

RT @JoaoMJaneiro: If you are attending ACL2025 join our oral presentation! Happening at 15:00 in room 1.86 🙂.

0

3

0

Belen Alastruey

@b_alastruey

6 months

RT @eduardosg_ai: Happy to see that Linguini, our benchmark for language-agnostic linguistic reasoning, has been included in DeepMind’s BIG….

0

3

0

Belen Alastruey

@b_alastruey

8 months

Happy to share our team's work on Large Concept Models (LCMs), a new approach for language modeling that goes beyond standard token-based LLMs by operating in a multilingual and multimodal embedding space. Check out the full paper!. 📄:

12

85

448

Belen Alastruey

@b_alastruey

11 months

TL;DR: We conduct the first-ever analysis of the training dynamics of ST systems. Based on its results, we adjust the Transformer architecture to enhance the performance while bypassing the pretraining stage. We're thrilled to share these findings at #EMNLP! See you in Miami! 🏖️

0

1

Belen Alastruey

@b_alastruey

11 months

Results?📈 . Our modified model, trained from scratch, not only performs comparably to its pretrained counterpart, but also reduces training time significantly by skipping the pretraining. 🧵(5/n)

1

0

1

Belen Alastruey

@b_alastruey

11 months

To assess our findings, we propose a small tweak in the decoder's cross-attention mechanism. This change forces the model to integrate source information earlier in the training process, hence causing a faster training of the encoder. 🧵(4/n)

1

0

1

Belen Alastruey

@b_alastruey

11 months

Why does this happen?🤔 . Models focus on language modeling to bypass the encoder information until the encoder is adequately trained. This process is quick in MT, non-existent when using a pre-trained encoder in ST, and lengthy when training an ST system from scratch. 🧵(3/n).

1

0

1

Belen Alastruey

@b_alastruey

11 months

We study how Transformers use the speech source data during training in two approaches: using a pretrained encoder vs. training from scratch. Our findings show that models trained from scratch struggle to effectively use speech inputs for predictions early in training. 🧵(2/n)

1

0

1

Belen Alastruey

@b_alastruey

11 months

🚨New #EMNLP Main paper🚨. What is the impact of ASR pretraining in Direct Speech Translation models?🤔. In our work we use interpretability to find out, and we use the findings to skip the pretaining!🔎📈. w/@geiongallego @costajussamarta. 📄: 🧵(1/n)

1

4

58

Belen Alastruey

@b_alastruey

11 months

RT @JoaoMJaneiro: Last week we released the first paper of my PhD, "MEXMA: Token-level objectives improve sentence representations". We….

0

8

0

Belen Alastruey

@b_alastruey

11 months

Happy to share Linguini🍝, a benchmark to evaluate linguistic reasoning in LLMs without relying on prior language-specific knowledge. We show the task is still hard for SOTA models, achieving below 25% accuracy. 📄:

Eduardo Sánchez

@eduardosg_ai

11 months

🚨NEW BENCHMARK🚨. Are LLMs good at linguistic reasoning if we minimize the chance of prior language memorization?. We introduce Linguini🍝, a benchmark for linguistic reasoning in which SOTA models perform below 25%. w/ @b_alastruey, @artetxem, @costajussamarta et al. 🧵(1/n)

0

3

52