Mahan Fathi
@MahanFathi
Followers
951
Following
208
Media
18
Statuses
71
llm research @nvidia👁️; ex @googledeepmind, @google🧠 & @mila_quebec.
Toronto, Ontario
Joined June 2011
We're looking for Summer Interns to join the Post-Training Team at @NVIDIA! DM me with your updated resume and three concise bullets detailing your most relevant experience — e.g. publications, repos, blogs, etc. RT please to help us find top talent.
13
35
460
Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇
33
125
874
NeMo RL is now open source! It replaces NeMo-Aligner and is the toolkit we use to post train next generations of our models. Give it a try
github.com
Scalable toolkit for efficient model reinforcement - NVIDIA-NeMo/RL
5
65
396
The talk I gave @ Mila on learning linearized representations of dynamical systems (Koopman representations) is on YouTube. The work was mainly carried out by @MahanFathi in collaboration with @pierrelux 's lab, and was presented at ICLR 2024. https://t.co/EPlTCIQj5O
0
3
21
In-context learnin (ICL) is one of the most exciting part of the LLM boom. Sequence models (not just LLMs) implement on-the-fly models conditionned on inputs w/o weight updates! Q: are in-context models better than «in-weights» ones? A: some times ICL is better than standard opt.
Introducing our new paper explaining in-context learning through the lens of Occam’s razor, giving a normative account of next-token prediction objectives. This was with @Tom__Marty @tejaskasetty @le0gagn0n @sarthmit @MahanFathi @dhanya_sridhar @g_lajoie_
0
6
21
Introducing our new paper explaining in-context learning through the lens of Occam’s razor, giving a normative account of next-token prediction objectives. This was with @Tom__Marty @tejaskasetty @le0gagn0n @sarthmit @MahanFathi @dhanya_sridhar @g_lajoie_
arxiv.org
A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in...
3
24
103
life update: thrilled to announce that i’ll be joining @nvidia as a research scientist on the alignment team. grateful for the support from mentors and peers. this is a dream come true for both the researcher and the gamer in me!
33
4
410
Last week, I gave a talk at @Mila_Quebec. The talk should be of interest to anyone working on predictive models, particularly in latent space. In collab. with @MahanFathi @ClementGehring @J_Pilault @davidkanaa @pierrelux. See you at @iclr_conf in 🇦🇹! https://t.co/vFBtHDzNju
drive.google.com
0
5
18
Congrats to Mahan, who is finishing his Master's thesis in beauty with this second paper.
Course Correcting Koopman Representations Accepted at #ICLR2024! We identify problems with unrolling in imagination and propose an unconventional, simple, yet effective solution: periodically "𝒓𝒆𝒆𝒏𝒄𝒐𝒅𝒊𝒏𝒈" the latent. 📄 https://t.co/ULNzqAV3bB
@GoogleDeepMind 1/🧵
0
3
26
This was joint work between @GoogleDeepMind and @Mila_Quebec. Many thanks to my supervisors @RGoroshin and @pierrelux for their constant support and guidance throughout the project. Also props to @ClementGehring, @J_Pilault and @davidkanaa. See you in Vienna! ❤️ 14/14
1
0
3
We have more theory and experiments in the paper, including higher-dim systems like MuJoCo environments (with control inputs!). Periodic reencoding always leads to (big) improvements, only at the cost of introducing one inference-time hyperparam, the reencoding period. 13/
1
0
3
This method produces stable, accurate, long-range future state predictions while being fairly robust to the reencoding period, i.e. the number of steps taken in latent space before reencoding happens. `reencode @ 0` -> no reencoding `reencode @ 1` -> every-step reencoding 12/
1
0
2
So far we have found out that 1) reencoding is necessary, and 2) it introduces its own error. We have discovered an effective tool, although imperfect. So let's use it in moderation. Enter "Periodic Reencoding!" Here we reencode the representations every so often (k steps). 11/
1
0
2
Fruit for thought: this is a bit weird because we expect the encoder and the decoder to be the inverses of one another, but they're not (why?). Unrolling the model this way, by "reencoding at every step," also results in poor performance, but at least w/o crossing behavior. 10/
1
0
2
We can form a loop by going from (z) to (x) at every unrolling step, and then back to (z). We call this "reencoding," achieved by calling the decoder and the encoder function over (z): (ϕ◦ψ(z)). 9/
1
0
3
There are 2 reasons for this. R1. We are modeling a closed system, with an open system. The original DS has the form (x' = f(x)), which forms a feedback loop. That "loop" is missing here. R2. The mapping from (z) to (x), i.e. the decoder, is non-injective, since (n > d)! 8/
1
0
2
A simple observation from the above plot is that the trajectory lines *cross* and this violates the first principles of an autonomous dynamical system. We know that (z) trajectories are faithful, and don't cross. Why do all of a sudden we get this behavior in (x) space? 7/
1
0
3
Here we train the model on the Duffing Oscillator system and look at the phase plots generated by unrolling the model using the above method. Well, things seem a bit off here. 6/
1
0
2
Cool. Now that we have a trained model, we should be able to take an initial condition as input (x), encode it to get the first latent (z), keep hitting (z) with (K) to get future (z)'s, and then decode everything back to (x). Let's try that out on a few dynamical systems. 5/
1
0
3