Mathurin Videau Profile
Mathurin Videau

@mathuvu_

Followers
113
Following
24
Media
12
Statuses
30

Joined October 2024
Don't wanna be here? Send us removal request.
@mathuvu_
Mathurin Videau
11 days
RT @_Vassim: 🚨New AI Security paper alert: Winter Soldier 🥶🚨.In our last paper, we show:.-how to backdoor a LM _without_ training it on the….
0
22
0
@mathuvu_
Mathurin Videau
11 days
RT @ni_jovanovic: There's a lot of work now on LLM watermarking. But can we extend this to transformers trained for autoregressive image ge….
0
54
0
@mathuvu_
Mathurin Videau
16 days
RT @iScienceLuvr: From Bytes to Ideas: Language Modeling with Autoregressive U-Nets. "Byte Pair Encoding (BPE) and similar schemes split te….
0
83
0
@mathuvu_
Mathurin Videau
16 days
RT @omarsar0: From Bytes to Ideas. Avoids using predefined vocabs and memory-heavy embedding tables. Instead, it uses Autoregressive U-Net….
0
40
0
@mathuvu_
Mathurin Videau
17 days
RT @arankomatsuzaki: From Bytes to Ideas: Language Modeling with Autoregressive U-Nets. Presents an autoregressive U-Net that processes raw….
0
54
0
@mathuvu_
Mathurin Videau
17 days
Links to paper and code. Please enjoy!.📄 🛠 8/8.
0
1
5
@mathuvu_
Mathurin Videau
17 days
In future work, we plan to make AU-Net hierarchies deeper so models think at even more abstract levels. We only want a portion of the model spending time on syntax and spelling, so most of the compute can be dedicated to thinking about the next idea instead of the next token. 7/8.
1
3
3
@mathuvu_
Mathurin Videau
17 days
Byte‑level training helps low‑resource languages. On FLORES‑200, AU‑Net‑2 gains ≈ +4 BLEU on average in translation from many low resource languages to English, out of the box, no finetunings here ! 6/8
Tweet media one
1
1
4
@mathuvu_
Mathurin Videau
17 days
In our experiments, all models were tuned via hyper‑parameter scaling laws. AU‑Net keeps pace with the best we could squeeze from BPE before adding its own hierarchical advantages. 5/8
Tweet media one
1
1
1
@mathuvu_
Mathurin Videau
17 days
Our AU‑Net matches or outperforms strong BPE baselines on most evals. 📊 (see big scary table) 4/8
Tweet media one
1
1
2
@mathuvu_
Mathurin Videau
17 days
The hierarchy acts as implicit multi‑token prediction: no extra losses or heads. Different levels of the hierarchy process different granularities, enabling future prediction while keeping autoregressive coherence. 3/8.
1
1
3
@mathuvu_
Mathurin Videau
17 days
Pooling keeps one vector at each split (word, every 2 words, …). Upsampling duplicates those coarse vectors with position‑specific linear layers and merges them through residual connections, yielding an Autoregressive U‑Net (AU-Net). 2/8
Tweet media one
1
1
4
@mathuvu_
Mathurin Videau
17 days
We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning. Joint work with @byoubii 1/8
Tweet media one
14
47
195
@mathuvu_
Mathurin Videau
3 months
RT @KrunoLehman: 1/ Happy to share my first accepted paper as a PhD student at @Meta and @ENS_ULM which I will present at @iclr_conf: . 📚 O….
0
13
0
@mathuvu_
Mathurin Videau
5 months
RT @TimDarcet: Want strong SSL, but not the complexity of DINOv2?. CAPI: Cluster and Predict Latents Patches for Improved Masked Image Mode….
0
108
0
@mathuvu_
Mathurin Videau
8 months
RT @cloneofsimo: Goddamn, this repo is true beauty. simple (not bloated).effective, scalable.elegant, just the right amount of abstraction.….
0
23
0
@mathuvu_
Mathurin Videau
8 months
RT @andrew_n_carr: A great example of FlexAttention used in a reasonably modern code base is Lingua. Which is designed to reproduce Llama 2….
0
11
0
@mathuvu_
Mathurin Videau
8 months
GitHub:
3
0
6
@mathuvu_
Mathurin Videau
8 months
Additional shoutouts to Daniel Haziza for contributing the model probe,  Luca Wehrstedt for float8  and Jade Copet for countless hours of debugging with us. And, of course, a huge thank you to David Lopez-Paz for his guidance, support, and for making open-sourcing this possible!.
1
0
6
@mathuvu_
Mathurin Videau
8 months
Many thanks to all of the wonderful people at FAIR who helped us in this project! Especially the xformers team, who shaped our understanding of distributed training, how GPUs work and how to keep them well fed.
1
0
5