
Sebastian Ruder @ ACL
@seb_ruder
Followers
92K
Following
10K
Media
380
Statuses
4K
Research Scientist @AIatMeta • Ex @Cohere @GoogleDeepMind
Berlin, Deutschland
Joined September 2014
RT @robinson_n8: This was my star-struck 🤩 moment at @aclmeeting, getting to have lunch with @guzmanhe, @costajussamarta, and of course the….
0
2
0
RT @s_scardapane: *The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs*.by @p_nawrot @PontiEdoardo @cheeesio @seb_ruder. T….
0
28
0
RT @cheeesio: The Multilingual Team at @cohere is hiring!. If this sounds like you, please apply:.- strong coding skills and a keen eye for….
jobs.ashbyhq.com
Play a crucial role in developing and enhancing our language models to support a wide range of languages - Your primary focus will be on data engineering tasks, including data collection, cleaning,...
0
29
0
RT @hugobowne: I had lunch with @seb_ruder in Berlin a few days ago. Had delicious food and a wonderful, generative conversation about how….
0
1
0
RT @p_nawrot: We built sparse-frontier — a clean abstraction that lets you focus on your custom sparse attention implementation while autom….
0
51
0
@p_nawrot @cheeesio @PontiEdoardo There is a huge amount of variety in this research area spanning.- when sparse attention is used (prefilling vs decoding).- which units are sparsified (blocks or vertical slashes).- what type of patterns are used (fixed or content-aware).- how the computational budget is
2
0
5
Check out the paper for more insights and details: . This was a fun collaboration with @p_nawrot, @cheeesio, @PontiEdoardo, Robert Li, and Renjie Huang!.
arxiv.org
Sparse attention offers a promising strategy to extend long-context capabilities in Transformer LLMs, yet its viability, its efficiency-accuracy trade-offs, and systematic scaling studies remain...
3
0
17
RT @p_nawrot: Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in L….
0
112
0
I'm super excited about these new models and what's still to come, in English and many more languages! 🌍.
Introducing our first set of Llama 4 models!. We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
1
0
38
RT @robertarail: Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬. We introduce MLGym and MLGym-Bench,….
0
120
0
@AIatMeta I've had a great time at @cohere where we made a lot of progress building multilingual LLMs. I wish my previous colleagues, in particular the Multilingual team all the best. cc @cheeesio @weiyinko_ml @KocmiTom @SCahyawijaya Alex Bérard, Théo Dehaze, Nithya Govindarajan.
2
0
54
A new year, a new challenge. I recently joined @AIatMeta to improve evaluation and benchmarking of LLMs. I'm excited to push on making LLMs more useful and accessible, via open-sourcing data/models and real-world applications. I'll continue to be based in Berlin.
37
22
685
RT @yanaiela: On that note, someone organizing a workshop at @aclmeeting (ACL 2025) wants to switch with our NAACL 2025 slot?.(I guess it's….
0
1
0
Reward models are crucial for aligning models to human preferences but so far their evaluation has been limited to English. I was fortunate to be involved with this @CohereForAI project, which introduces a new multilingual RM benchmark and many insightful analyses.
✨ New Evaluation Benchmark for Reward Models - We Go Multilingual! ✨. Introducing M-RewardBench: A massively multilingual RM evaluation benchmark covering 23 typologically different languages across 5 tasks. Paper, code, dataset: Our contributions:.1/9
6
22
106
RT @sGx_tweets: ✨ New Evaluation Benchmark for Reward Models - We Go Multilingual! ✨. Introducing M-RewardBench: A massively multilingual R….
0
24
0