(V-)MoEs working well at the small end of the scale spectrum. Often we actually saw the largest gains with small models so it's a promising direction. Also, the authors seem to have per-image routing working well, which is nice. Tweet added by Neil Houlsby @neilhoulsby

Neil Houlsby

9 months

(V-)MoEs working well at the small end of the scale spectrum. Often we actually saw the largest gains with small models so it's a promising direction. Also, the authors seem to have per-image routing working well, which is nice.

@_akhaliq

9 months

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts paper page: Sparse Mixture-of-Experts models (MoEs) have recently gained popularity due to their ability to decouple model size from inference efficiency by only activating a…

238

Replies

catid (e/acc)

@MrCatid

9 months

@neilhoulsby @_akhaliq Does 5% gain seem worth the incredible amount of additional complexity?