Albert Gu Profile
Albert Gu

@_albertgu

Followers
17K
Following
2K
Media
44
Statuses
438

assistant prof @mldcmu. chief scientist @cartesia_ai. leading the ssm revolution.

Joined December 2018
Don't wanna be here? Send us removal request.
@_albertgu
Albert Gu
17 days
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Tweet media one
@sukjun_hwang
Sukjun (June) Hwang
17 days
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
Tweet media one
Tweet media two
60
184
1K
@_albertgu
Albert Gu
4 days
RT @aran_nayebi: 🚀 New Open-Source Release! PyTorchTNN 🚀. A PyTorch package for building biologically-plausible temporal neural networks (T….
0
38
0
@_albertgu
Albert Gu
4 days
will be in taipei for a few weeks if anyone happens to be around and wants to meet up!.
1
0
34
@_albertgu
Albert Gu
9 days
I'll be giving the first H-Net talk this afternoon at 4:30-5 PT at the ES-FoMo workshop!. come support the fight against Big Token 🙏.
@ESFoMo
ES-FoMo@ICML2025
10 days
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
Tweet media one
5
11
137
@_albertgu
Albert Gu
10 days
And yell at @fluorane if you want it faster.
0
0
10
@_albertgu
Albert Gu
10 days
Haven’t vetted this closely yet but awesome to see community helping out here! We will also try to release some simple training code in a few weeks, stay tuned.
@main_horse
main
10 days
wdym bro. just write the code. how hard could it be
Tweet media one
6
4
122
@_albertgu
Albert Gu
10 days
RT @chenwanch1: One of my favorite moments at #ICML2025 was being able to witness @_albertgu and the @cartesia_ai team’s reaction to Mamba….
0
8
0
@_albertgu
Albert Gu
11 days
RT @tenderizzation: BPE transformer watching an H-Net output an entire wikipedia article as one chunk
Tweet media one
0
23
0
@_albertgu
Albert Gu
12 days
RT @pfau: Just saw the phrase "Big Token" to describe OAI/Anthropic/GDM/xAI/Meta and now I can't stop thinking about it.
0
47
0
@_albertgu
Albert Gu
13 days
RT @_goel_arnav: I just saw @_albertgu call the major AI labs as "Big Token" and it has to be the most hilarious shit ever lol.
0
31
0
@_albertgu
Albert Gu
13 days
people are asking why i’m a polar bear, can fat albert get some appreciation 🙏.
@_albertgu
Albert Gu
7 months
2025 is the year of fat albert.
0
0
52
@_albertgu
Albert Gu
13 days
I'm at ICML for the week!!. come find the @cartesia_ai booth to chat about architectures, tokenizers, voice AI, etc. @sukjun_hwang and @fluorane will also be around to talk about H-Nets 🙌.
3
3
113
@_albertgu
Albert Gu
13 days
We’ll be talking about fine-grained differences between Transformers and SSMs, and how to distill them better. Lots of surprising findings in this paper!.
@avivbick
Aviv Bick
13 days
@_albertgu and I are presenting today at 11 a.m. in East Exhibition Hall A-B (E-2712). If you’re interested in the capability gap between Transformers and SSMs—and how to close it—come by and chat!.
0
1
33
@_albertgu
Albert Gu
14 days
Btw this is essentially the mechanism I was hoping for when writing about this hypothetical “filler token” task:.
0
0
17
@_albertgu
Albert Gu
14 days
And love to see the community dig in so fast! Thanks @main_horse !!.
1
0
14
@_albertgu
Albert Gu
14 days
Cool demo and really nice blog post on H-Net inference: > On stage2_XL, this completely flipped. Instead of getting chunks every char, I was getting chunks after huge spans of repeats had been generated. This is a great demonstration of the power of.
Tweet card summary image
main-horse.github.io
@main_horse
main
14 days
H-Nets are the future.
4
29
317
@_albertgu
Albert Gu
14 days
impressive results on super long-form speech generation (> 10 minutes)!. glad to see that the intuitions here closely track what I wrote about in my blog post about SSMs vs Transformers. 1. SSMs make more sense for long context where coherence matters more.
@JulianSlzr
Julian Salazar
14 days
Excited to share Long-Form Speech Generation with Spoken LMs at #ICML2025 (Wed. oral)!. We’ll present:.- LibriSpeech-Long: new benchmark and evals for long-form generation quality.- SpeechSSM: 1st *textless* spoken LMs for expressive *unbounded* speech. Listen and learn more: 🧵
2
14
124
@_albertgu
Albert Gu
14 days
Big Token is quaking in their boots. dont worry, we’re here to free you all.
@gallabytes
theseriousadult
15 days
. wtf anthropic?
Tweet media one
2
5
111
@_albertgu
Albert Gu
17 days
This was an incredibly important project to me - I’ve wanted to solve it for years, but had no idea how. This was all @sukjun_hwang and @fluorane's amazing work!. I wrote about the story of its development, and what might be coming next. The H-Net:
7
20
263