AmbroiseOdonnat Profile Banner
Ambroise Odonnat Profile
Ambroise Odonnat

@AmbroiseOdonnat

Followers
381
Following
2K
Media
41
Statuses
139

Ph.D. Student @Huawei Noah’s Ark Lab & @inria | Working on transformers and distribution shifts | Research blog: https://t.co/f7zqOM9JYz

Paris, France
Joined December 2012
Don't wanna be here? Send us removal request.
@AmbroiseOdonnat
Ambroise Odonnat
1 year
🥳My friend @oussamazekri_ and I are happy to launch our research blog. Today, we are releasing the first blogpost on the Convolutional Kernel Network (CKN) introduced by @julienmairal. Many thanks to him for proofreading it! We hope you will enjoy it.
Tweet media one
9
101
649
@AmbroiseOdonnat
Ambroise Odonnat
9 days
RT @_Vassim: 🚨New AI Security paper alert: Winter Soldier 🥶🚨.In our last paper, we show:.-how to backdoor a LM _without_ training it on the….
0
22
0
@AmbroiseOdonnat
Ambroise Odonnat
9 days
Here is the recording with the slides for those interested! . 🎤 📊📑 @Cohere_Labs @Cohere_Labs.
@Cohere_Labs
Cohere Labs
21 days
Our ML Theory group is looking forward to welcoming @AmbroiseOdonnat next week on Thursday, June 19th for a session on "Large Language Models as Markov Chains"
Tweet media one
0
5
18
@AmbroiseOdonnat
Ambroise Odonnat
21 days
🚀To know more about LLM as Markov Chains, join in on June 19th at 6 pm CET (Paris time)!!😀 . Huge thanks to @itsmaddox_j and @cohere @Cohere_Labs for the invitation 🤗. Paper: .meeting:
@Cohere_Labs
Cohere Labs
21 days
Our ML Theory group is looking forward to welcoming @AmbroiseOdonnat next week on Thursday, June 19th for a session on "Large Language Models as Markov Chains"
Tweet media one
0
3
4
@AmbroiseOdonnat
Ambroise Odonnat
21 days
RT @itsmaddox_j: Super excited to host @AmbroiseOdonnat next week, particularly to hear about the connection between LLMs and their n-gram….
0
2
0
@AmbroiseOdonnat
Ambroise Odonnat
27 days
RT @romanplaud: Very interesting work! These findings closely align with ours on the existence of high-entropy 𝘤𝘳𝘪𝘵𝘪𝘤𝘢𝘭 𝘵𝘰𝘬𝘦𝘯𝘴—tokens that….
0
3
0
@AmbroiseOdonnat
Ambroise Odonnat
27 days
RT @MoritzLaurer: Hidden gem: The @Cohere_Labs speaker series. Every week you can just drop into a call where some of the best ML/AI resear….
0
16
0
@AmbroiseOdonnat
Ambroise Odonnat
2 months
💎It also works for the newest -- strongest Gemma3 models (👏🏽@ramealexandre @mblondel_ml)!
Tweet media one
@attentionmech
attentionmech
2 months
WOW. this is so underrated. Based on this formulation, they came up with a approximation trend which follows the trend of MMLU performance very closely.
Tweet media one
0
1
5
@AmbroiseOdonnat
Ambroise Odonnat
2 months
RT @attentionmech: paper reading thread-
Tweet media one
0
61
0
@AmbroiseOdonnat
Ambroise Odonnat
4 months
📑Paper: 📈Slides: (better with Adobe Reader for nice GIFs).🌐Website:
0
0
1
@AmbroiseOdonnat
Ambroise Odonnat
4 months
🤗Thanks a lot, @haeggee and Prof. Martin Jaggi, for having me in the MLO group @EPFL this week to present "Large Language Models as Markov Chains". The slides are available on my website (link in thread). 🎉 New experiments with Llama and Gemma models in the updated paper!
Tweet media one
1
2
8
@AmbroiseOdonnat
Ambroise Odonnat
4 months
RT @IevgenRedko: Our team open-sourced MANTIS: a foundation model for time series classification. It is lightweight, more efficient than c….
0
5
0
@AmbroiseOdonnat
Ambroise Odonnat
4 months
RT @garridoq_: The last paper of my PhD is finally out ! Introducing."Intuitive physics understanding emerges from self-supervised pretrain….
0
165
0
@AmbroiseOdonnat
Ambroise Odonnat
5 months
RT @geoffnegiar: We just released our new website! . Our goal for now is to provide the easiest, fastest benchmarking tools for forecasting….
0
4
0
@AmbroiseOdonnat
Ambroise Odonnat
5 months
RT @oussamazekri_: 🚀 Policy gradient methods like DeepSeek’s GRPO are great for finetuning LLMs via RLHF. But what happens when we swap au….
0
9
0
@AmbroiseOdonnat
Ambroise Odonnat
5 months
Finally, I can't thank you enough @_Vassim and @CabannesVivien for this collab: you are a rare combination of super-smart and fun to work with!. Hopefully, more to come soon🤠. "Moi, si je devais résumer ma vie aujourd’hui avec vous, je dirais que c’est d’abord des rencontres."
Tweet media one
0
0
2
@AmbroiseOdonnat
Ambroise Odonnat
5 months
We want to thank @dohmatobelvis, @EshaanNichani, @_GPaolo, Faniriana Rakoto Endor, and @IevgenRedko for fruitful discussions during the elaboration of this work 😇.7/🧵.
1
0
2
@AmbroiseOdonnat
Ambroise Odonnat
5 months
From the theoretical side, we show that clustering heads can be learned via gradient descent and provide theoretical insights into the two-stage learning observed in practice. 6/🧵
Tweet media one
1
0
0
@AmbroiseOdonnat
Ambroise Odonnat
5 months
We investigate loss spikes, suggesting potential strategies for mitigation, which could lead to more stable training processes. We also peek into the transferability of circuits to showcase the usefulness of curriculum learning and data curation. 5/🧵
Tweet media one
1
0
0
@AmbroiseOdonnat
Ambroise Odonnat
5 months
In the second, we unveil "𝑪𝒍𝒖𝒔𝒕𝒆𝒓𝒊𝒏𝒈 𝑯𝒆𝒂𝒅𝒔", circuits that learn the invariance of the task. Their training dynamic is in two phases: 1) clustering of the attention embeddings according to invariance and 2) classifier fitting. 4/🧵
Tweet media one
1
0
0
@AmbroiseOdonnat
Ambroise Odonnat
5 months
In the first paper, we show how GD (gradient descent) reinforces useful circuits in transformers while pruning others to create sub-circuits that help solve complex tasks by breaking them down into intermediate reasoning steps. 3/🧵
Tweet media one
1
0
0