Diana Abagyan @dianaabagyan X Profile

Diana Abagyan

@dianaabagyan

Followers

58

Following

90

Media

8

Statuses

28

Research Scholar @Cohere_Labs

Joined June 2025

Don't wanna be here? Send us removal request.

Diana Abagyan

@dianaabagyan

17 days

🚨New pretraining paper on multilingual tokenizers 🚨. Super excited to share my work with @Cohere_Labs: One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers

3

33

102

Diana Abagyan

@dianaabagyan

7 days

RT @puneeshdeora: 🚨 New paper drop! 🚨. 🤔 When a transformer sees a sequence that could be explained by many rules, which rule does it pick?….

0

48

0

Diana Abagyan

@dianaabagyan

7 days

RT @nsaphra: 🚨 New preprint! 🚨 Phase transitions! We love to see them during LM training. Syntactic attention structure, induction heads, g….

0

43

0

Diana Abagyan

@dianaabagyan

7 days

RT @Cohere_Labs: Can we improve the performance of LLMs during inference without the need for extensive sampling OR special reward models?….

0

9

0

Diana Abagyan

@dianaabagyan

7 days

RT @ammar__khairi: 💪🏼Huge thanks to my incredible mentors: Julia Kreutzer @mrdanieldsouza, @YeS855811, @sarahookr for guiding me and suppor….

0

5

0

Diana Abagyan

@dianaabagyan

7 days

RT @ammar__khairi: 🚀 Want better LLM performance without extra training or special reward models?.Happy to share my work with @Cohere_labs….

0

20

0

Diana Abagyan

@dianaabagyan

13 days

RT @Cohere_Labs: How can AI capture the nuances of different languages?💬🗨️. By using a team of specialized teacher models via Multilingual….

0

8

0

Diana Abagyan

@dianaabagyan

13 days

RT @Cohere_Labs: 🤹 How do we move away from complicated and brittle prompt engineering at inference for under-represented tasks?🤔. 🧠 Our la….

0

11

0

Diana Abagyan

@dianaabagyan

15 days

RT @ahmetustun89: Can we train models for better inference-time control instead of over-complex prompt engineering❓. Turns out the key is i….

0

8

0

Diana Abagyan

@dianaabagyan

15 days

RT @mrdanieldsouza: 🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨. Thrilled to share our la….

0

17

0

Diana Abagyan

@dianaabagyan

16 days

RT @Cohere_Labs: Global MMLU is revolutionizing multilingual AI. 🌍. Recognized by Stanford HAI and adopted by top labs, it's the benchmark….

0

14

0

Diana Abagyan

@dianaabagyan

16 days

RT @srishti_gureja: Our paper M-RewardBench got accepted to ACL main: We construct the first-of-its-kind multiling….

0

12

0

Diana Abagyan

@dianaabagyan

16 days

RT @irombie: amazing work!!!.

0

2

0

Diana Abagyan

@dianaabagyan

16 days

RT @sarahookr: Huge congrats to @dianaabagyan on her first first author paper. Was a pleasure collaborating on this work — we ask what chea….

0

10

0

Diana Abagyan

@dianaabagyan

17 days

RT @ahmetustun89: An excellent work by @dianaabagyan💎. We show that a "universal" tokenizer, covering more than just primary languages, gre….

0

6

0

Diana Abagyan

@dianaabagyan

17 days

A huge thank you to all of my mentors and collaborators, especially @ahmetustun89, @sarahookr, @alexrs95, and @mziizm for their guidance and support ✨. 📜Check out our paper!

1

6

16

Diana Abagyan

@dianaabagyan

17 days

TLDR; We find that the universal tokenizer is twice as effective!.It is vastly more sample efficient, achieving comparable performance in ⅛ of the data and compute required for the cluster tokenizer.

1

2

10

Diana Abagyan

@dianaabagyan

17 days

Although the universal tokenizer does not hurt primary language performance, we find that a difference does emerge when the vocabulary size decreases. Therefore, using a large vocabulary size is key.

1

9

Diana Abagyan

@dianaabagyan

17 days

We find that data in pretraining for new languages is optional when using the universal tokenizer. Even when there is no data from the expanded languages in pretraining, the universal tokenizer exhibits significantly higher adaptation gains.

1

2

9

Diana Abagyan

@dianaabagyan

17 days

Why not just switch to a universal tokenizer after training?. While this is an improvement over an unadapted tokenizer, it’s not as effective in language adaptation as using a universal tokenizer from the start of pretraining.

1

2

9

Diana Abagyan

@dianaabagyan

17 days

We also test performance even in the most difficult setting of low data, entirely unseen languages (unseen in both tokenizer and pretraining). Our universal tokenizer sees win rates up to 5% higher than a specialized tokenizer.

1

2

11