
Albert Gu
@_albertgu
Followers
17K
Following
2K
Media
44
Statuses
438
assistant prof @mldcmu. chief scientist @cartesia_ai. leading the ssm revolution.
Joined December 2018
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
60
184
1K
RT @aran_nayebi: 🚀 New Open-Source Release! PyTorchTNN 🚀. A PyTorch package for building biologically-plausible temporal neural networks (T….
0
38
0
I'll be giving the first H-Net talk this afternoon at 4:30-5 PT at the ES-FoMo workshop!. come support the fight against Big Token 🙏.
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
5
11
137
RT @chenwanch1: One of my favorite moments at #ICML2025 was being able to witness @_albertgu and the @cartesia_ai team’s reaction to Mamba….
0
8
0
RT @tenderizzation: BPE transformer watching an H-Net output an entire wikipedia article as one chunk
0
23
0
RT @pfau: Just saw the phrase "Big Token" to describe OAI/Anthropic/GDM/xAI/Meta and now I can't stop thinking about it.
0
47
0
RT @_goel_arnav: I just saw @_albertgu call the major AI labs as "Big Token" and it has to be the most hilarious shit ever lol.
0
31
0
I'm at ICML for the week!!. come find the @cartesia_ai booth to chat about architectures, tokenizers, voice AI, etc. @sukjun_hwang and @fluorane will also be around to talk about H-Nets 🙌.
3
3
113
We’ll be talking about fine-grained differences between Transformers and SSMs, and how to distill them better. Lots of surprising findings in this paper!.
@_albertgu and I are presenting today at 11 a.m. in East Exhibition Hall A-B (E-2712). If you’re interested in the capability gap between Transformers and SSMs—and how to close it—come by and chat!.
0
1
33
Cool demo and really nice blog post on H-Net inference: > On stage2_XL, this completely flipped. Instead of getting chunks every char, I was getting chunks after huge spans of repeats had been generated. This is a great demonstration of the power of.
main-horse.github.io
4
29
317
impressive results on super long-form speech generation (> 10 minutes)!. glad to see that the intuitions here closely track what I wrote about in my blog post about SSMs vs Transformers. 1. SSMs make more sense for long context where coherence matters more.
Excited to share Long-Form Speech Generation with Spoken LMs at #ICML2025 (Wed. oral)!. We’ll present:.- LibriSpeech-Long: new benchmark and evals for long-form generation quality.- SpeechSSM: 1st *textless* spoken LMs for expressive *unbounded* speech. Listen and learn more: 🧵
2
14
124
@sukjun_hwang @fluorane I also realized we forgot to link the code:.
github.com
H-Net: Hierarchical Network with Dynamic Chunking. Contribute to goombalab/hnet development by creating an account on GitHub.
1
5
61
This was an incredibly important project to me - I’ve wanted to solve it for years, but had no idea how. This was all @sukjun_hwang and @fluorane's amazing work!. I wrote about the story of its development, and what might be coming next. The H-Net:
7
20
263