Sebastian Ruder @ ACL @seb_ruder X Profile

Sebastian Ruder @ ACL

@seb_ruder

Followers

92K

Following

10K

Media

380

Statuses

4K

Research Scientist @AIatMeta • Ex @Cohere @GoogleDeepMind

Berlin, Deutschland

Joined September 2014

Don't wanna be here? Send us removal request.

Sebastian Ruder @ ACL

@seb_ruder

2 days

RT @robinson_n8: This was my star-struck 🤩 moment at @aclmeeting, getting to have lunch with @guzmanhe, @costajussamarta, and of course the….

0

2

0

Sebastian Ruder @ ACL

@seb_ruder

6 days

RT @davlanade: @seb_ruder giving the first keynote talk on Llama 4

0

4

0

Sebastian Ruder @ ACL

@seb_ruder

10 days

RT @s_scardapane: *The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs*.by @p_nawrot @PontiEdoardo @cheeesio @seb_ruder. T….

0

28

0

Sebastian Ruder @ ACL

@seb_ruder

13 days

I'll be at ACL 2025 in Vienna next week. Say hi if you want to chat about multilinguality, LLM evaluation or doing research in industry. You can also check out our ACL papers:.- M-RewardBench (Main): - Global MMLU (Main): - Arabic.

5

8

119

Sebastian Ruder @ ACL

@seb_ruder

1 month

RT @cheeesio: The Multilingual Team at @cohere is hiring!. If this sounds like you, please apply:.- strong coding skills and a keen eye for….

jobs.ashbyhq.com

Play a crucial role in developing and enhancing our language models to support a wide range of languages - Your primary focus will be on data engineering tasks, including data collection, cleaning,...

0

29

0

Sebastian Ruder @ ACL

@seb_ruder

1 month

RT @hugobowne: I had lunch with @seb_ruder in Berlin a few days ago. Had delicious food and a wonderful, generative conversation about how….

0

1

0

Sebastian Ruder @ ACL

@seb_ruder

2 months

RT @p_nawrot: We built sparse-frontier — a clean abstraction that lets you focus on your custom sparse attention implementation while autom….

0

51

0

Sebastian Ruder @ ACL

@seb_ruder

3 months

@p_nawrot @cheeesio @PontiEdoardo There is a huge amount of variety in this research area spanning.- when sparse attention is used (prefilling vs decoding).- which units are sparsified (blocks or vertical slashes).- what type of patterns are used (fixed or content-aware).- how the computational budget is

2

0

5

Sebastian Ruder @ ACL

@seb_ruder

3 months

Check out the paper for more insights and details: . This was a fun collaboration with @p_nawrot, @cheeesio, @PontiEdoardo, Robert Li, and Renjie Huang!.

arxiv.org

Sparse attention offers a promising strategy to extend long-context capabilities in Transformer LLMs, yet its viability, its efficiency-accuracy trade-offs, and systematic scaling studies remain...

3

0

17

Sebastian Ruder @ ACL

@seb_ruder

3 months

Our findings:.1) For short seqs, increasing density or size provides gains. For long seqs, high sparsity performs best. 2) Higher sparsity is possible for decoding and larger models. However, most configs deteriorate performance significantly for at least one task. 3) There is no

1

0

5

Sebastian Ruder @ ACL

@seb_ruder

3 months

The Sparse Frontier. Efficient sparse attention methods are key to scale LLMs to long contexts. We conduct the largest-scale empirical analysis that answers:.1. 🤏🔍 Are small dense models or large sparse models better?.2. ♾️What is the maximum permissible sparsity per task?.3.

11

30

187

Sebastian Ruder @ ACL

@seb_ruder

3 months

RT @_akhaliq: The Sparse Frontier. Sparse Attention Trade-offs in Transformer LLMs

0

33

0

Sebastian Ruder @ ACL

@seb_ruder

3 months

RT @p_nawrot: Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in L….

0

112

0

Sebastian Ruder @ ACL

@seb_ruder

4 months

I'm super excited about these new models and what's still to come, in English and many more languages! 🌍.

Ahmad Al-Dahle

@Ahmad_Al_Dahle

4 months

Introducing our first set of Llama 4 models!. We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

1

0

38

Sebastian Ruder @ ACL

@seb_ruder

6 months

RT @robertarail: Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬. We introduce MLGym and MLGym-Bench,….

0

120

0

Sebastian Ruder @ ACL

@seb_ruder

7 months

@AIatMeta I've had a great time at @cohere where we made a lot of progress building multilingual LLMs. I wish my previous colleagues, in particular the Multilingual team all the best. cc @cheeesio @weiyinko_ml @KocmiTom @SCahyawijaya Alex Bérard, Théo Dehaze, Nithya Govindarajan.

2

0

54

Sebastian Ruder @ ACL

@seb_ruder

7 months

A new year, a new challenge. I recently joined @AIatMeta to improve evaluation and benchmarking of LLMs. I'm excited to push on making LLMs more useful and accessible, via open-sourcing data/models and real-world applications. I'll continue to be based in Berlin.

37

22

685

Sebastian Ruder @ ACL

@seb_ruder

9 months

RT @yanaiela: On that note, someone organizing a workshop at @aclmeeting (ACL 2025) wants to switch with our NAACL 2025 slot?.(I guess it's….

0

1

0

Sebastian Ruder @ ACL

@seb_ruder

9 months

Reward models are crucial for aligning models to human preferences but so far their evaluation has been limited to English. I was fortunate to be involved with this @CohereForAI project, which introduces a new multilingual RM benchmark and many insightful analyses.

Srishti Gureja

@srishti_gureja

9 months

✨ New Evaluation Benchmark for Reward Models - We Go Multilingual! ✨. Introducing M-RewardBench: A massively multilingual RM evaluation benchmark covering 23 typologically different languages across 5 tasks. Paper, code, dataset: Our contributions:.1/9

6

22

106

Sebastian Ruder @ ACL

@seb_ruder

9 months

RT @sGx_tweets: ✨ New Evaluation Benchmark for Reward Models - We Go Multilingual! ✨. Introducing M-RewardBench: A massively multilingual R….

0

24

0