Hot take inspired by ConvNeXt : Grouped convs are overrated. They're popular bc obsession w/ inference throughput & raw accuracy, disregard for training cost, & FLOPs-hacking. Vanilla convs are pareto-superior unless training is ~free relative to inference