Stefan Horoi Profile
Stefan Horoi

@stefanhoroi

Followers
48
Following
9
Media
7
Statuses
13

PhD student at @UMontreal and @Mila_Quebec, currently working on model merging and representation comparison.

Montréal, Québec
Joined December 2016
Don't wanna be here? Send us removal request.
@stefanhoroi
Stefan Horoi
1 month
🔎Do better expert models always lead to better model merging & MoErging? And how does expert training (duration) affect model upcycling?.We tackle these questions in our recent work: “Less is More: Undertraining Experts Improves Model Upcycling”.🧵1/N.
1
5
9
@stefanhoroi
Stefan Horoi
1 month
@gkdziugaite @ebelilov @mrguywolf We thank @Google, @Mila_Quebec, @CRSNG_NSERC, FRQNT and CIFAR for their generous research funding and support!.🧵9/N.
1
0
3
@stefanhoroi
Stefan Horoi
1 month
This is joint work with @gkdziugaite, @ebelilov and @mrguywolf!. 📜Read our ArXiv preprint here: Contact us with any questions or comments, or simply drop them below 👇🏻- we’d love to hear your thoughts!.🧵8/N.
Tweet card summary image
arxiv.org
Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and...
1
0
4
@stefanhoroi
Stefan Horoi
1 month
Finally, we show that a simple early stopping strategy that favors expert undertraining and adapts the training duration for each task can recover optimal upcycling accuracy for model merging and MoErging. 🧵7/N
Tweet media one
Tweet media two
1
0
3
@stefanhoroi
Stefan Horoi
1 month
For model MoErging, overtraining the constituent LoRA experts leads to lower final accuracy after further multi-task training of the MoE model. 🧵6/N
Tweet media one
1
0
3
@stefanhoroi
Stefan Horoi
1 month
We analyze this phenomenon through the lens of data difficulty, showing that later training steps are primarily guided by the loss of a small fraction of difficult examples which are predominantly forgotten when merging. 🧵5/N
Tweet media one
1
0
3
@stefanhoroi
Stefan Horoi
1 month
We analyze how the LoRA rank affects this phenomenon, and find that higher ranks achieve better merging results for all training durations and smaller accuracy drops with extended training. 🧵4/N
Tweet media one
1
0
3
@stefanhoroi
Stefan Horoi
1 month
When merging LoRAs, the negative impact of extended training on merging is even stronger, with some merging methods experiencing >10% drops in accuracy. 🧵3/N
Tweet media one
1
0
3
@stefanhoroi
Stefan Horoi
1 month
In both vision and language settings we find that the optimal merging performance is achieved early in the fine-tuning process, after which performance degrades despite the individual expert models being better on their respective tasks. 🧵2/N
Tweet media one
1
0
3
@stefanhoroi
Stefan Horoi
5 months
RT @benjamintherien: How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still ma….
0
20
0
@stefanhoroi
Stefan Horoi
1 year
Very excited to present our paper "Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis" at @icmlconf 2024! Come see our poster tomorrow, Wed. July 24th 1h30-3pm. Paper: Code: @Mila_Quebec #ICML2024.
0
7
12
@stefanhoroi
Stefan Horoi
8 years
Mes remerciements les plus sincères à la Fondation Schulich, à M. Seymour Schulich et à l'Université de Montréal! #2017SLSquad
Tweet media one
0
0
1