Okay, as promised yesterday, I will post my (afaik novel) theory on the connection between branch specialization, neuron superposition and the vanishing gradients problem.
@Final_Industry
DMed me to explain it earlier, so I'll copy what I said.
This thread might be long.........