Stefan Horoi @stefanhoroi X Profile

Stefan Horoi

@stefanhoroi

Followers

48

Following

9

Media

7

Statuses

13

PhD student at @UMontreal and @Mila_Quebec, currently working on model merging and representation comparison.

Montréal, Québec

Joined December 2016

Don't wanna be here? Send us removal request.

Stefan Horoi

@stefanhoroi

1 month

🔎Do better expert models always lead to better model merging & MoErging? And how does expert training (duration) affect model upcycling?.We tackle these questions in our recent work: “Less is More: Undertraining Experts Improves Model Upcycling”.🧵1/N.

1

5

9

Stefan Horoi

@stefanhoroi

1 month

@gkdziugaite @ebelilov @mrguywolf We thank @Google, @Mila_Quebec, @CRSNG_NSERC, FRQNT and CIFAR for their generous research funding and support!.🧵9/N.

1

0

3

Stefan Horoi

@stefanhoroi

1 month

This is joint work with @gkdziugaite, @ebelilov and @mrguywolf!. 📜Read our ArXiv preprint here: Contact us with any questions or comments, or simply drop them below 👇🏻- we’d love to hear your thoughts!.🧵8/N.

arxiv.org

Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and...

1

0

4

Stefan Horoi

@stefanhoroi

1 month

Finally, we show that a simple early stopping strategy that favors expert undertraining and adapts the training duration for each task can recover optimal upcycling accuracy for model merging and MoErging. 🧵7/N

1

0

3

Stefan Horoi

@stefanhoroi

1 month

For model MoErging, overtraining the constituent LoRA experts leads to lower final accuracy after further multi-task training of the MoE model. 🧵6/N

1

0

3

Stefan Horoi

@stefanhoroi

1 month

We analyze this phenomenon through the lens of data difficulty, showing that later training steps are primarily guided by the loss of a small fraction of difficult examples which are predominantly forgotten when merging. 🧵5/N

1

0

3

Stefan Horoi

@stefanhoroi

1 month

We analyze how the LoRA rank affects this phenomenon, and find that higher ranks achieve better merging results for all training durations and smaller accuracy drops with extended training. 🧵4/N

1

0

3

Stefan Horoi

@stefanhoroi

1 month

When merging LoRAs, the negative impact of extended training on merging is even stronger, with some merging methods experiencing >10% drops in accuracy. 🧵3/N

1

0

3

Stefan Horoi

@stefanhoroi

1 month

In both vision and language settings we find that the optimal merging performance is achieved early in the fine-tuning process, after which performance degrades despite the individual expert models being better on their respective tasks. 🧵2/N

1

0

3

Stefan Horoi

@stefanhoroi

5 months

RT @benjamintherien: How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still ma….

0

20

0

Stefan Horoi

@stefanhoroi

1 year

Very excited to present our paper "Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis" at @icmlconf 2024! Come see our poster tomorrow, Wed. July 24th 1h30-3pm. Paper: Code: @Mila_Quebec #ICML2024.

0

7

12

Stefan Horoi

@stefanhoroi

8 years

Mes remerciements les plus sincères à la Fondation Schulich, à M. Seymour Schulich et à l'Université de Montréal! #2017SLSquad

0

1