seb_ruder Profile Banner
Sebastian Ruder @ ACL Profile
Sebastian Ruder @ ACL

@seb_ruder

Followers
92K
Following
10K
Media
380
Statuses
4K

Research Scientist @AIatMeta • Ex @Cohere @GoogleDeepMind

Berlin, Deutschland
Joined September 2014
Don't wanna be here? Send us removal request.
@seb_ruder
Sebastian Ruder @ ACL
2 days
RT @robinson_n8: This was my star-struck 🤩 moment at @aclmeeting, getting to have lunch with @guzmanhe, @costajussamarta, and of course the….
0
2
0
@seb_ruder
Sebastian Ruder @ ACL
6 days
RT @davlanade: @seb_ruder giving the first keynote talk on Llama 4
Tweet media one
0
4
0
@seb_ruder
Sebastian Ruder @ ACL
10 days
RT @s_scardapane: *The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs*.by @p_nawrot @PontiEdoardo @cheeesio @seb_ruder. T….
0
28
0
@seb_ruder
Sebastian Ruder @ ACL
13 days
I'll be at ACL 2025 in Vienna next week. Say hi if you want to chat about multilinguality, LLM evaluation or doing research in industry. You can also check out our ACL papers:.- M-RewardBench (Main): - Global MMLU (Main): - Arabic.
5
8
119
@seb_ruder
Sebastian Ruder @ ACL
1 month
RT @cheeesio: The Multilingual Team at @cohere is hiring!. If this sounds like you, please apply:.- strong coding skills and a keen eye for….
jobs.ashbyhq.com
Play a crucial role in developing and enhancing our language models to support a wide range of languages - Your primary focus will be on data engineering tasks, including data collection, cleaning,...
0
29
0
@seb_ruder
Sebastian Ruder @ ACL
1 month
RT @hugobowne: I had lunch with @seb_ruder in Berlin a few days ago. Had delicious food and a wonderful, generative conversation about how….
0
1
0
@seb_ruder
Sebastian Ruder @ ACL
2 months
RT @p_nawrot: We built sparse-frontier — a clean abstraction that lets you focus on your custom sparse attention implementation while autom….
0
51
0
@seb_ruder
Sebastian Ruder @ ACL
3 months
@p_nawrot @cheeesio @PontiEdoardo There is a huge amount of variety in this research area spanning.- when sparse attention is used (prefilling vs decoding).- which units are sparsified (blocks or vertical slashes).- what type of patterns are used (fixed or content-aware).- how the computational budget is
Tweet media one
Tweet media two
2
0
5
@seb_ruder
Sebastian Ruder @ ACL
3 months
Our findings:.1) For short seqs, increasing density or size provides gains. For long seqs, high sparsity performs best. 2) Higher sparsity is possible for decoding and larger models. However, most configs deteriorate performance significantly for at least one task. 3) There is no
Tweet media one
1
0
5
@seb_ruder
Sebastian Ruder @ ACL
3 months
The Sparse Frontier. Efficient sparse attention methods are key to scale LLMs to long contexts. We conduct the largest-scale empirical analysis that answers:.1. 🤏🔍 Are small dense models or large sparse models better?.2. ♾️What is the maximum permissible sparsity per task?.3.
Tweet media one
11
30
187
@seb_ruder
Sebastian Ruder @ ACL
3 months
RT @_akhaliq: The Sparse Frontier. Sparse Attention Trade-offs in Transformer LLMs
Tweet media one
0
33
0
@seb_ruder
Sebastian Ruder @ ACL
3 months
RT @p_nawrot: Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in L….
0
112
0
@seb_ruder
Sebastian Ruder @ ACL
4 months
I'm super excited about these new models and what's still to come, in English and many more languages! 🌍.
@Ahmad_Al_Dahle
Ahmad Al-Dahle
4 months
Introducing our first set of Llama 4 models!. We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
Tweet media one
1
0
38
@seb_ruder
Sebastian Ruder @ ACL
6 months
RT @robertarail: Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬. We introduce MLGym and MLGym-Bench,….
0
120
0
@seb_ruder
Sebastian Ruder @ ACL
7 months
@AIatMeta I've had a great time at @cohere where we made a lot of progress building multilingual LLMs. I wish my previous colleagues, in particular the Multilingual team all the best. cc @cheeesio @weiyinko_ml @KocmiTom @SCahyawijaya Alex Bérard, Théo Dehaze, Nithya Govindarajan.
2
0
54
@seb_ruder
Sebastian Ruder @ ACL
7 months
A new year, a new challenge. I recently joined @AIatMeta to improve evaluation and benchmarking of LLMs. I'm excited to push on making LLMs more useful and accessible, via open-sourcing data/models and real-world applications. I'll continue to be based in Berlin.
37
22
685
@seb_ruder
Sebastian Ruder @ ACL
9 months
RT @yanaiela: On that note, someone organizing a workshop at @aclmeeting (ACL 2025) wants to switch with our NAACL 2025 slot?.(I guess it's….
0
1
0
@seb_ruder
Sebastian Ruder @ ACL
9 months
Reward models are crucial for aligning models to human preferences but so far their evaluation has been limited to English. I was fortunate to be involved with this @CohereForAI project, which introduces a new multilingual RM benchmark and many insightful analyses.
@srishti_gureja
Srishti Gureja
9 months
✨ New Evaluation Benchmark for Reward Models - We Go Multilingual! ✨. Introducing M-RewardBench: A massively multilingual RM evaluation benchmark covering 23 typologically different languages across 5 tasks. Paper, code, dataset: Our contributions:.1/9
Tweet media one
6
22
106
@seb_ruder
Sebastian Ruder @ ACL
9 months
RT @sGx_tweets: ✨ New Evaluation Benchmark for Reward Models - We Go Multilingual! ✨. Introducing M-RewardBench: A massively multilingual R….
0
24
0