Tomasz Limisiewicz
@TomLimi
Followers
551
Following
1K
Media
47
Statuses
231
Postdoctoral researcher at @meta Fair and @uwnlp , Interested in going into the inner workings of neural networks, multilingualism, and fairer NLP (he/him)
Seattle
Joined September 2021
Excited to continue my research adventure as a postdoc at @uwnlp and @Meta ! I’ve joined @LukeZettlemoyer's fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text! [🦋posting]
5
2
120
Carmen Sandiego is heading to #NeurIPS2025 - finally, a good use for this costume! I'm on the industry job market and organizing the agents + reasoning & planning workshop. Excited to chat about research (LLM robustness, reasoning, theory of mind), and job opportunities. DM me!
4
15
132
Considering a PhD/MSc in NLP? I’m hiring students this cycle! If you are passionate about making language models reliable and safe, eager about understanding and controlling language models, and would like to add to your research some multilingual flavor - apply to my group! 👇
16
102
737
🚀 Introducing the Latent Speech-Text Transformer (LST) — a speech-text model that organizes speech tokens into latent patches for better text→speech transfer, enabling steeper scaling laws and more efficient multimodal training ⚡️ Paper 📄 https://t.co/4nUsbC1YKF
7
16
34
New paper! 🌈 In English, pie = 🥧. In Spanish, pie = 🦶. Multilingual tokenizers often share such overlapping tokens between languages. Do these “False Friends” hurt or help multilingual LMs? We find that overlap consistently improves transfer—even when it seems misleading. 🧵
1
21
100
🎥 Videos of our invited talks and the panel discussion are now also available on YouTube: https://t.co/fFbH7kYkpZ ▶️
youtube.com
Tokenization Workshop (TokShop) https://tokenization-workshop.github.io 1st edition co-located with ICML 2025: https://icml.cc/virtual/2025/workshop/39998
🎥 Videos from our Tokenization Workshop are now live! Watch invited talks, panel discussions, and the best paper presentation at https://t.co/Sc3KWHOS5r
#ICML2025 #Tokenization #NLProc #LLMs
0
3
6
TokShop videos are finally out! 🎥🤩 Check out the great talks from @yuvalpi (Join them? beat them? Fix them?) @delliott (Pixel LM) @AdrianLancucki (dynamic segmentation) . panel with hot takes from 🔥: @alisawuffles @_albertgu @yuvalpi @magikarp_tokens @kroscoo
🎥 Videos from our Tokenization Workshop are now live! Watch invited talks, panel discussions, and the best paper presentation at https://t.co/Sc3KWHOS5r
#ICML2025 #Tokenization #NLProc #LLMs
1
2
12
BPE tokenization has been a safe bet for language models for almost 10 years now. 😮 So cool to see the status quo being challenged by yet another lab in recent weeks! 🔥
Introducing two new tokenizer-free LLM checkpoints from our research lab: TFree-HAT 7B Built on our Hierarchical Autoregressive Transformer (HAT) architecture, these models achieve top-tier German and English performance while processing text on a UTF-8 byte level.
0
0
10
🏆 Announcing our Best Paper Awards! 🥇 Winner: "BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization" https://t.co/m84BWBuY46 🥈 Runner-up: "One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression" https://t.co/mrha4PrK4c Congrats! 🎉
0
4
17
most controversial statement so far from @alisawuffles: "tokenization research is not as cool" **very vocals disagreements from crowd of tokenization nerds**
3
4
58
Panel on Future of Tokenization is happening now in Meeting 111-112. With: @alisawuffles @_albertgu @yuvalpi @magikarp_tokens @kroscoo Moderated by: @esalesky
0
2
29
Check the Byte Latent Transformer poster at @tokshop2025. It’s just fortaste before the main presentation soon at @aclmeeting from @ArtidoroPagnoni!
1
7
78
Happening now in Meeting 112 -113 @icmlconf !
Three invited speakers will share their insights at TokShop! Hear from Yuval Pinter @yuvalpi, Desmond Elliott @delliott, and Adrian Łańcuck @AdrianLancuckii on cutting-edge tokenization research. Don't miss these keynote presentations! #ICML2025
https://t.co/yAwjLwyvaV
1
0
2
Looking forward for out panel at 3:30. We’ll talk about future of tokenization: BLT, SuperBPE @alisawuffles , H-nets @_albertgu and further breakthroughs in tokenization @yuvalpi @magikarp_tokens @kroscoo
https://t.co/0aruzPfekj
🎤 Meet our expert panelists! Join Albert Gu, Alisa Liu, Kris Cao, Sander Land, and Yuval Pinter as they discuss the Future of Tokenization on July 18 at 3:30 PM at TokShop at #ICML2025.
0
0
4
It’d be great to meet at Tokenization Workshop @tokshop2025 @icmlconf tomorrow July 18 starting at 8:45 in Meeting 112-113!
The TokShop schedule is now live! Join us at #ICML2025 for invited talks, poster sessions, and a panel on the future of tokenization. https://t.co/UCdWdobEgh
#Tokenization #LLM #NLProc
1
1
9
🎤 Meet our expert panelists! Join Albert Gu, Alisa Liu, Kris Cao, Sander Land, and Yuval Pinter as they discuss the Future of Tokenization on July 18 at 3:30 PM at TokShop at #ICML2025.
0
10
38
Got a good tokenization paper under review at COLM, but the scores were a letdown? 😬 Why bother with rebuttal when the perfect venue is right around the corner! Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! 🚀
0
6
7
📝 Submit papers (up to 9 pages, shorter submission ) via OpenReview: https://t.co/eX4ACk7oxf 🗓️ Important dates: Deadline: May 30, 2025 Notifications: June 9, 2025 Workshop: July 18, 2025 Both archival and non-archival options available! #ICML2025 #TokShop #ML #NLProc
openreview.net
Welcome to the OpenReview homepage for ICML 2025 Workshop TokShop
0
3
3