Albert Gu @_albertgu X Profile

Albert Gu

@_albertgu

Followers

17K

Following

2K

Media

44

Statuses

438

assistant prof @mldcmu. chief scientist @cartesia_ai. leading the ssm revolution.

Joined December 2018

Don't wanna be here? Send us removal request.

Albert Gu

@_albertgu

17 days

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

Sukjun (June) Hwang

@sukjun_hwang

17 days

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

60

184

1K

Albert Gu

@_albertgu

4 days

RT @aran_nayebi: 🚀 New Open-Source Release! PyTorchTNN 🚀. A PyTorch package for building biologically-plausible temporal neural networks (T….

0

38

0

Albert Gu

@_albertgu

4 days

will be in taipei for a few weeks if anyone happens to be around and wants to meet up!.

1

0

34

Albert Gu

@_albertgu

9 days

I'll be giving the first H-Net talk this afternoon at 4:30-5 PT at the ES-FoMo workshop!. come support the fight against Big Token 🙏.

ES-FoMo@ICML2025

@ESFoMo

10 days

Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/

5

11

137

Albert Gu

@_albertgu

10 days

And yell at @fluorane if you want it faster.

0

10

Albert Gu

@_albertgu

10 days

Haven’t vetted this closely yet but awesome to see community helping out here! We will also try to release some simple training code in a few weeks, stay tuned.

main

@main_horse

10 days

wdym bro. just write the code. how hard could it be

6

4

122

Albert Gu

@_albertgu

10 days

RT @chenwanch1: One of my favorite moments at #ICML2025 was being able to witness @_albertgu and the @cartesia_ai team’s reaction to Mamba….

0

8

0

Albert Gu

@_albertgu

11 days

RT @tenderizzation: BPE transformer watching an H-Net output an entire wikipedia article as one chunk

0

23

0

Albert Gu

@_albertgu

12 days

RT @pfau: Just saw the phrase "Big Token" to describe OAI/Anthropic/GDM/xAI/Meta and now I can't stop thinking about it.

0

47

0

Albert Gu

@_albertgu

13 days

RT @_goel_arnav: I just saw @_albertgu call the major AI labs as "Big Token" and it has to be the most hilarious shit ever lol.

0

31

0

Albert Gu

@_albertgu

13 days

people are asking why i’m a polar bear, can fat albert get some appreciation 🙏.

Albert Gu

@_albertgu

7 months

2025 is the year of fat albert.

0

52

Albert Gu

@_albertgu

13 days

I'm at ICML for the week!!. come find the @cartesia_ai booth to chat about architectures, tokenizers, voice AI, etc. @sukjun_hwang and @fluorane will also be around to talk about H-Nets 🙌.

3

113

Albert Gu

@_albertgu

13 days

We’ll be talking about fine-grained differences between Transformers and SSMs, and how to distill them better. Lots of surprising findings in this paper!.

Aviv Bick

@avivbick

13 days

@_albertgu and I are presenting today at 11 a.m. in East Exhibition Hall A-B (E-2712). If you’re interested in the capability gap between Transformers and SSMs—and how to close it—come by and chat!.

0

1

33

Albert Gu

@_albertgu

14 days

Btw this is essentially the mechanism I was hoping for when writing about this hypothetical “filler token” task:.

0

17

Albert Gu

@_albertgu

14 days

And love to see the community dig in so fast! Thanks @main_horse !!.

1

0

14

Albert Gu

@_albertgu

14 days

Cool demo and really nice blog post on H-Net inference: > On stage2_XL, this completely flipped. Instead of getting chunks every char, I was getting chunks after huge spans of repeats had been generated. This is a great demonstration of the power of.

main-horse.github.io

main

@main_horse

14 days

H-Nets are the future.

4

29

317

Albert Gu

@_albertgu

14 days

impressive results on super long-form speech generation (> 10 minutes)!. glad to see that the intuitions here closely track what I wrote about in my blog post about SSMs vs Transformers. 1. SSMs make more sense for long context where coherence matters more.

Julian Salazar

@JulianSlzr

14 days

Excited to share Long-Form Speech Generation with Spoken LMs at #ICML2025 (Wed. oral)!. We’ll present:.- LibriSpeech-Long: new benchmark and evals for long-form generation quality.- SpeechSSM: 1st *textless* spoken LMs for expressive *unbounded* speech. Listen and learn more: 🧵

2

14

124

Albert Gu

@_albertgu

14 days

Big Token is quaking in their boots. dont worry, we’re here to free you all.

theseriousadult

@gallabytes

15 days

. wtf anthropic?

2

5

111

Albert Gu

@_albertgu

16 days

@sukjun_hwang @fluorane I also realized we forgot to link the code:.

github.com

H-Net: Hierarchical Network with Dynamic Chunking. Contribute to goombalab/hnet development by creating an account on GitHub.

1

5

61

Albert Gu

@_albertgu

17 days

This was an incredibly important project to me - I’ve wanted to solve it for years, but had no idea how. This was all @sukjun_hwang and @fluorane's amazing work!. I wrote about the story of its development, and what might be coming next. The H-Net:

7

20

263