@xuanalogue
xuan (ɕɥɛn / sh-yen)
3 months
Deep neural networks are just Gaussian Processes with a squared exponential kernel confirmed 😎
Tweet media one
13
92
923

Replies

@xuanalogue
xuan (ɕɥɛn / sh-yen)
3 months
In all seriousness though, it wouldn't surprise me if this result could be shown to follow by modeling NNs as approximating a high dimensional Gaussian Process posterior! Especially given all the work on infinite width NNs being the same as GPs.
@katie_kang_
Katie Kang
9 months
New paper!! We found a pattern in how NNs extrapolate: as inputs become more OOD, model outputs tend to go towards some “average”-like prediction. What is this “average”-like prediction? Why does this happen? Can we leverage this to better handle OOD inputs? (Spoiler: Yes!) 🧵:
Tweet media one
21
187
1K
1
3
45
@xuanalogue
xuan (ɕɥɛn / sh-yen)
3 months
I feel like I should point out that my GP to the screenshot doesn't actually show reversion to the OCS ---- the mean of the GP is 0, but not the mean of the training data. To get reversion to the OCS, you have to explicitly fit the GP mean (which is a common preprocessing step).
0
0
7
@Ethan_smith_20
Ethan
3 months
@xuanalogue Do you have a link to the visual?
1
0
1
@xuanalogue
xuan (ɕɥɛn / sh-yen)
3 months
2
0
14
@edelmann_domi
Dominic Edelmann
3 months
@xuanalogue But I mean there are theoretical results like this, right?
1
4
16
@xuanalogue
xuan (ɕɥɛn / sh-yen)
3 months
@edelmann_domi Yup! Was thinking of these.
0
0
2
@alisabets
Ali Sabet
3 months
@xuanalogue mfw yarin gal already said this in 2015
Tweet media one
2
1
7
@marjan_milo
Marjan Milosavljević
3 months
@xuanalogue So it turns out that deep neural networks are essentially Gaussian Processes, cloaked in a squared exponential kernel. Quite the revelation! 😎
0
0
1
@xidulu
Xidulu
3 months
@xuanalogue Somehow not surprising observation from the Bayesian lens? In extrapolation areas, the (posterior) predictive distribution collapses to the "prior predictive distribution", which does not favor any particular class (OCS?).
0
0
1
@EliSennesh
Eli Sennesh
3 months
@xuanalogue Of course, wouldn't each neural net you train then count as one sample from the Gaussian process prior that's been evolved towards the marginal likelihood by training? So you can't be very Bayesian with it.
0
0
1
@6___0
catsNstuff
3 months
@xuanalogue i knew it!
0
0
0
@mykshaalo
Myk Shaalo
3 months
@xuanalogue Why the squared exponential kernel, specifically? Seems like it should vary with the choice of activations, loss function, and others.
0
0
0
@aiooiaooia
FatalFlaws
3 months
@xuanalogue That’s an old result. And what you are describing is Functional Neural Process?
0
0
0
@d_valdenegro
Daniel Valdenegro
3 months
0
0
0
@sreagm
Sreerag M./ ശ്രീരാഗ് എം.
3 months
@xuanalogue I always knew.
0
0
0