dianaabagyan Profile Banner
Diana Abagyan Profile
Diana Abagyan

@dianaabagyan

Followers
58
Following
90
Media
8
Statuses
28

Research Scholar @Cohere_Labs

Joined June 2025
Don't wanna be here? Send us removal request.
@dianaabagyan
Diana Abagyan
17 days
🚨New pretraining paper on multilingual tokenizers 🚨. Super excited to share my work with @Cohere_Labs: One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers
Tweet media one
3
33
102
@dianaabagyan
Diana Abagyan
7 days
RT @puneeshdeora: 🚨 New paper drop! 🚨. 🤔 When a transformer sees a sequence that could be explained by many rules, which rule does it pick?….
0
48
0
@dianaabagyan
Diana Abagyan
7 days
RT @nsaphra: 🚨 New preprint! 🚨 Phase transitions! We love to see them during LM training. Syntactic attention structure, induction heads, g….
0
43
0
@dianaabagyan
Diana Abagyan
7 days
RT @Cohere_Labs: Can we improve the performance of LLMs during inference without the need for extensive sampling OR special reward models?….
0
9
0
@dianaabagyan
Diana Abagyan
7 days
RT @ammar__khairi: 💪🏼Huge thanks to my incredible mentors: Julia Kreutzer @mrdanieldsouza, @YeS855811, @sarahookr for guiding me and suppor….
0
5
0
@dianaabagyan
Diana Abagyan
7 days
RT @ammar__khairi: 🚀 Want better LLM performance without extra training or special reward models?.Happy to share my work with @Cohere_labs….
0
20
0
@dianaabagyan
Diana Abagyan
13 days
RT @Cohere_Labs: How can AI capture the nuances of different languages?💬🗨️. By using a team of specialized teacher models via Multilingual….
0
8
0
@dianaabagyan
Diana Abagyan
13 days
RT @Cohere_Labs: 🤹 How do we move away from complicated and brittle prompt engineering at inference for under-represented tasks?🤔. 🧠 Our la….
0
11
0
@dianaabagyan
Diana Abagyan
15 days
RT @ahmetustun89: Can we train models for better inference-time control instead of over-complex prompt engineering❓. Turns out the key is i….
0
8
0
@dianaabagyan
Diana Abagyan
15 days
RT @mrdanieldsouza: 🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨. Thrilled to share our la….
0
17
0
@dianaabagyan
Diana Abagyan
16 days
RT @Cohere_Labs: Global MMLU is revolutionizing multilingual AI. 🌍. Recognized by Stanford HAI and adopted by top labs, it's the benchmark….
0
14
0
@dianaabagyan
Diana Abagyan
16 days
RT @srishti_gureja: Our paper M-RewardBench got accepted to ACL main: We construct the first-of-its-kind multiling….
0
12
0
@dianaabagyan
Diana Abagyan
16 days
RT @irombie: amazing work!!!.
0
2
0
@dianaabagyan
Diana Abagyan
16 days
RT @sarahookr: Huge congrats to @dianaabagyan on her first first author paper. Was a pleasure collaborating on this work — we ask what chea….
0
10
0
@dianaabagyan
Diana Abagyan
17 days
RT @ahmetustun89: An excellent work by @dianaabagyan💎. We show that a "universal" tokenizer, covering more than just primary languages, gre….
0
6
0
@dianaabagyan
Diana Abagyan
17 days
A huge thank you to all of my mentors and collaborators, especially @ahmetustun89, @sarahookr, @alexrs95, and @mziizm for their guidance and support ✨. 📜Check out our paper!
1
6
16
@dianaabagyan
Diana Abagyan
17 days
TLDR; We find that the universal tokenizer is twice as effective!.It is vastly more sample efficient, achieving comparable performance in ⅛ of the data and compute required for the cluster tokenizer.
Tweet media one
1
2
10
@dianaabagyan
Diana Abagyan
17 days
Although the universal tokenizer does not hurt primary language performance, we find that a difference does emerge when the vocabulary size decreases. Therefore, using a large vocabulary size is key.
Tweet media one
1
1
9
@dianaabagyan
Diana Abagyan
17 days
We find that data in pretraining for new languages is optional when using the universal tokenizer. Even when there is no data from the expanded languages in pretraining, the universal tokenizer exhibits significantly higher adaptation gains.
Tweet media one
1
2
9
@dianaabagyan
Diana Abagyan
17 days
Why not just switch to a universal tokenizer after training?. While this is an improvement over an unadapted tokenizer, it’s not as effective in language adaptation as using a universal tokenizer from the start of pretraining.
Tweet media one
1
2
9
@dianaabagyan
Diana Abagyan
17 days
We also test performance even in the most difficult setting of low data, entirely unseen languages (unseen in both tokenizer and pretraining). Our universal tokenizer sees win rates up to 5% higher than a specialized tokenizer.
Tweet media one
1
2
11