Mathurin Videau @mathuvu_ X Profile

Mathurin Videau

@mathuvu_

Followers

113

Following

24

Media

12

Statuses

30

Joined October 2024

Don't wanna be here? Send us removal request.

Mathurin Videau

@mathuvu_

11 days

RT @_Vassim: 🚨New AI Security paper alert: Winter Soldier 🥶🚨.In our last paper, we show:.-how to backdoor a LM _without_ training it on the….

0

22

0

Mathurin Videau

@mathuvu_

11 days

RT @ni_jovanovic: There's a lot of work now on LLM watermarking. But can we extend this to transformers trained for autoregressive image ge….

0

54

0

Mathurin Videau

@mathuvu_

16 days

RT @iScienceLuvr: From Bytes to Ideas: Language Modeling with Autoregressive U-Nets. "Byte Pair Encoding (BPE) and similar schemes split te….

0

83

0

Mathurin Videau

@mathuvu_

16 days

RT @omarsar0: From Bytes to Ideas. Avoids using predefined vocabs and memory-heavy embedding tables. Instead, it uses Autoregressive U-Net….

0

40

0

Mathurin Videau

@mathuvu_

17 days

RT @arankomatsuzaki: From Bytes to Ideas: Language Modeling with Autoregressive U-Nets. Presents an autoregressive U-Net that processes raw….

0

54

0

Mathurin Videau

@mathuvu_

17 days

Links to paper and code. Please enjoy!.📄 🛠 8/8.

0

1

5

Mathurin Videau

@mathuvu_

17 days

In future work, we plan to make AU-Net hierarchies deeper so models think at even more abstract levels. We only want a portion of the model spending time on syntax and spelling, so most of the compute can be dedicated to thinking about the next idea instead of the next token. 7/8.

1

3

Mathurin Videau

@mathuvu_

17 days

Byte‑level training helps low‑resource languages. On FLORES‑200, AU‑Net‑2 gains ≈ +4 BLEU on average in translation from many low resource languages to English, out of the box, no finetunings here ! 6/8

1

4

Mathurin Videau

@mathuvu_

17 days

In our experiments, all models were tuned via hyper‑parameter scaling laws. AU‑Net keeps pace with the best we could squeeze from BPE before adding its own hierarchical advantages. 5/8

1

Mathurin Videau

@mathuvu_

17 days

Our AU‑Net matches or outperforms strong BPE baselines on most evals. 📊 (see big scary table) 4/8

1

2

Mathurin Videau

@mathuvu_

17 days

The hierarchy acts as implicit multi‑token prediction: no extra losses or heads. Different levels of the hierarchy process different granularities, enabling future prediction while keeping autoregressive coherence. 3/8.

1

3

Mathurin Videau

@mathuvu_

17 days

Pooling keeps one vector at each split (word, every 2 words, …). Upsampling duplicates those coarse vectors with position‑specific linear layers and merges them through residual connections, yielding an Autoregressive U‑Net (AU-Net). 2/8

1

4

Mathurin Videau

@mathuvu_

17 days

We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning. Joint work with @byoubii 1/8

14

47

195

Mathurin Videau

@mathuvu_

3 months

RT @KrunoLehman: 1/ Happy to share my first accepted paper as a PhD student at @Meta and @ENS_ULM which I will present at @iclr_conf: . 📚 O….

0

13

0

Mathurin Videau

@mathuvu_

5 months

RT @TimDarcet: Want strong SSL, but not the complexity of DINOv2?. CAPI: Cluster and Predict Latents Patches for Improved Masked Image Mode….

0

108

0

Mathurin Videau

@mathuvu_

8 months

RT @cloneofsimo: Goddamn, this repo is true beauty. simple (not bloated).effective, scalable.elegant, just the right amount of abstraction.….

0

23

0

Mathurin Videau

@mathuvu_

8 months

RT @andrew_n_carr: A great example of FlexAttention used in a reasonably modern code base is Lingua. Which is designed to reproduce Llama 2….

0

11

0

Mathurin Videau

@mathuvu_

8 months

GitHub:

3

0

6

Mathurin Videau

@mathuvu_

8 months

Additional shoutouts to Daniel Haziza for contributing the model probe, Luca Wehrstedt for float8 and Jade Copet for countless hours of debugging with us. And, of course, a huge thank you to David Lopez-Paz for his guidance, support, and for making open-sourcing this possible!.

1

0

6

Mathurin Videau

@mathuvu_

8 months

Many thanks to all of the wonderful people at FAIR who helped us in this project! Especially the xformers team, who shaped our understanding of distributed training, how GPUs work and how to keep them well fed.

1

0

5