Alisa Liu @ COLM 🦙 Profile
Alisa Liu @ COLM 🦙

@alisawuffles

Followers
3K
Following
3K
Media
26
Statuses
383

final-year PhD student at @uwcse @uwnlp | on the job market!

Joined November 2019
Don't wanna be here? Send us removal request.
@alisawuffles
Alisa Liu @ COLM 🦙
7 months
We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵
94
328
3K
@iamgrigorev
George Grigorev
9 days
Update! I confirm that 10x larger model trained with SuperBPE also achieves the same train loss, while val loss is even slightly lower now. So I don't see any reason why you should not use SuperBPE by default now (apart from some small nuances during MQA evaluation)
@iamgrigorev
George Grigorev
10 days
Today I'm publishing my first blog post: Tokenization from first principles. I built a Byte-level BPE tokenizer with Rust pre-tokenization and achieved encoding speed on par with huggingface tokenizers. I show ideas and algorithms including nuances of implementation, such as
2
3
24
@XiaochuangHan
Xiaochuang Han
8 days
Our team at Meta FAIR is hiring a PhD research intern for 2026. The topics broadly involve multimodal generative AI (e.g., video/image generation in addition to text), with flexible approaches across architecture/data/algorithms. Please apply via the link below, and feel free to
3
43
255
@soldni
Luca Soldaini 🎀
8 days
olmo 2 poster at 11am 100% merch sale everything must go (don’t make me travel back to seattle with swag)
3
4
61
@soldni
Luca Soldaini 🎀
9 days
yo has anyone heard of this Olmo model, loss looks good
7
12
193
@iamgrigorev
George Grigorev
10 days
Run with SuperBPE tokenizer achieves the same val loss as BPE Code: https://t.co/VAOoudSkvW Blog post: https://t.co/wxmAjq8Nxa I had a lot of fun covering all the details and running experiments. Stay tuned for more!
1
3
24
@alisawuffles
Alisa Liu @ COLM 🦙
9 days
Super happy to be at COLM!!🦙 It's been so fun to see familiar faces & make new friends. @JonathanHayase and I will be presenting SuperBPE TODAY (Wednesday) in the 🕓11AM poster session, come say hi! 📎 https://t.co/qU2gZUvZal
1
4
94
@kylelostat
Kyle Lo
11 days
flyin to #COLM2025 along w bunch of the @allen_ai team come chat w me about pretraining horror stories, data & evals, what we're cookin for next olmo, etc made a 🔥 poster for thursday sess, come say hi
0
7
66
@iamgrigorev
George Grigorev
10 days
Today I'm publishing my first blog post: Tokenization from first principles. I built a Byte-level BPE tokenizer with Rust pre-tokenization and achieved encoding speed on par with huggingface tokenizers. I show ideas and algorithms including nuances of implementation, such as
9
28
334
@alisawuffles
Alisa Liu @ COLM 🦙
15 days
@s_zhengbr went all the way, quantifying the effect of using diff tokenizations on benchmarks, identifying tasks (such as char counting) where modifying the input tokenization *improves* performance, and shedding light on the source. Read more in 📄!
Tweet card summary image
arxiv.org
Modern tokenizers employ deterministic algorithms to map text into a single "canonical" token sequence, yet the same string can be encoded as many non-canonical tokenizations using the tokenizer...
0
1
2
@alisawuffles
Alisa Liu @ COLM 🦙
15 days
It began from a 🤯🤯 observation: when giving LMs text tokenized at *character*-level, its generation seemed virtually unaffected — even tho these token seqs are provably never seen in training! Suggests: functional char-level understanding, and tokenization as test-time control.
@s_zhengbr
Brian Zheng
18 days
Can a LM that has only ever seen the word “cat” tokenized as ␣cat, understand the token sequence [␣, c, a, t]? In our NeurIPS spotlight ⭐, we show that the answer is surprisingly YES, and in fact, you can even modify the tokenization at inference-time for performance gains!🧵
6
8
74
@StellaLisy
Stella Li @COLM2025
15 days
🚨What if solving a problem correctly isn't enough—cuz the WAY to reason about it based on your audience matters just as much⁉️ We introduce ✨personalized reasoning✨: proactively asking user preferences and adapting HOW models think Frontier models are not doing well at this!🧵
2
44
206
@vikhyatk
vik
17 days
@alisawuffles @moondream_ai thank you for for the great work on superbpe! we upsampled our finetuning corpus so the savings are often higher on the downstream tasks customers care about. also reduces training cost to reach the same model performance, which is important for a smaller company like us :)
0
2
4
@alisawuffles
Alisa Liu @ COLM 🦙
18 days
Super excited to see @moondream_ai's newest model use SuperBPE!! We did a little bit of analysis — using SuperBPE reduced their seqlen by 21% on average and made the token frequency distribution more uniform, meaning fewer hyper-frequent & hyper-rare tokens!
@vikhyatk
vik
28 days
Excited to release a preview of Moondream 3. A 9B param, 2B active MoE vision language model that makes no compromises; offering state-of-the-art visual reasoning while still retaining an efficient and deployment-friendly form factor.
8
18
188
@s_zhengbr
Brian Zheng
18 days
Can a LM that has only ever seen the word “cat” tokenized as ␣cat, understand the token sequence [␣, c, a, t]? In our NeurIPS spotlight ⭐, we show that the answer is surprisingly YES, and in fact, you can even modify the tokenization at inference-time for performance gains!🧵
5
15
82
@alisawuffles
Alisa Liu @ COLM 🦙
21 days
Catherine really eloquently demystifies the tensions between tokenizer-based and "tokenizer-free" language modeling, and how public disdain for tokenization is stinting progress we could make together. Highly recommend this read!!
0
1
9
@alisawuffles
Alisa Liu @ COLM 🦙
21 days
Every LM needs a way of encoding data, and any choice of encoding is a design choice. When using bytes, you borrow choices from the makers of UTF8, and there’s generally no reason to believe that the most common encoding on the internet is also the best one for language modeling.
@linguist_cat
Catherine Arnett @ 🍁 COLM🍁
22 days
I have a new blog post about the so-called “tokenizer-free” approach to language modeling and why it’s not tokenizer-free at all. I also talk about why people hate tokenizers so much!
2
8
91
@giffmana
Lucas Beyer (bl16)
21 days
Great blogpost walking through tokenization vs "tokenize free" approaches, arguing that there isn't really such thing as "tokenize free" and even using utf8 bytes inherits choices made by other people (Unicode consortium) and is not clear these are sensible for LLMs.
@linguist_cat
Catherine Arnett @ 🍁 COLM🍁
22 days
I have a new blog post about the so-called “tokenizer-free” approach to language modeling and why it’s not tokenizer-free at all. I also talk about why people hate tokenizers so much!
18
61
675
@lateinteraction
Omar Khattab
21 days
Yes. I think overly heuristic or restrictive tokenization is a problem, but "tokenization" as such is not your enemy. It's pretty much the central design element of the Transformer in the first place.
@linguist_cat
Catherine Arnett @ 🍁 COLM🍁
22 days
I have a new blog post about the so-called “tokenizer-free” approach to language modeling and why it’s not tokenizer-free at all. I also talk about why people hate tokenizers so much!
1
1
12
@sewoong79
Sewoong Oh
25 days
SuperBPE ( https://t.co/rqLXu0bVG6) adopted by the latest Open-source VLM of Moondream 3 by @moondreamai ! Exciting to see amazing collaboration by @alisawuffles and @jonathanhayase continue to make impact.
@vikhyatk
vik
28 days
These are some highlights, but there's lots more to talk about. We extended the context length from 2K to 32K tokens. We're using a SuperBPE tokens so our tokens are better than your tokens. We've done some things to make the weights more adaptable when you finetune. Etc. etc.
10
7
90
@vikhyatk
vik
28 days
These are some highlights, but there's lots more to talk about. We extended the context length from 2K to 32K tokens. We're using a SuperBPE tokens so our tokens are better than your tokens. We've done some things to make the weights more adaptable when you finetune. Etc. etc.
3
3
125