Deep neural networks are just Gaussian Processes with a squared exponential kernel confirmed 😎 Tweet added by xuan (ɕɥɛn / sh-yen) @xuanalogue

xuan (ɕɥɛn / sh-yen)

@xuanalogue

3 months

Deep neural networks are just Gaussian Processes with a squared exponential kernel confirmed 😎

13

92

923

xuan (ɕɥɛn / sh-yen)

@xuanalogue

3 months

In all seriousness though, it wouldn't surprise me if this result could be shown to follow by modeling NNs as approximating a high dimensional Gaussian Process posterior! Especially given all the work on infinite width NNs being the same as GPs.

Katie Kang

@katie_kang_

9 months

New paper!! We found a pattern in how NNs extrapolate: as inputs become more OOD, model outputs tend to go towards some “average”-like prediction. What is this “average”-like prediction? Why does this happen? Can we leverage this to better handle OOD inputs? (Spoiler: Yes!) 🧵:

21

187

1K

1

3

45

xuan (ɕɥɛn / sh-yen)

@xuanalogue

3 months

I feel like I should point out that my GP to the screenshot doesn't actually show reversion to the OCS ---- the mean of the GP is 0, but not the mean of the training data. To get reversion to the OCS, you have to explicitly fit the GP mean (which is a common preprocessing step).

0

7

Ethan

@Ethan_smith_20

3 months

@xuanalogue Do you have a link to the visual?

1

0

1

xuan (ɕɥɛn / sh-yen)

@xuanalogue

3 months

@Ethan_smith_20

2

0

14

Dominic Edelmann

@edelmann_domi

3 months

@xuanalogue But I mean there are theoretical results like this, right?

Deep Neural Networks as Gaussian Processes

It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width....

arxiv.org

1

4

16

xuan (ɕɥɛn / sh-yen)

@xuanalogue

3 months

@edelmann_domi Yup! Was thinking of these.

0

2

Ali Sabet

@alisabets

3 months

@xuanalogue mfw yarin gal already said this in 2015

2

1

7

Marjan Milosavljević

@marjan_milo

3 months

@xuanalogue So it turns out that deep neural networks are essentially Gaussian Processes, cloaked in a squared exponential kernel. Quite the revelation! 😎

0

1

Xidulu

@xidulu

3 months

@xuanalogue Somehow not surprising observation from the Bayesian lens? In extrapolation areas, the (posterior) predictive distribution collapses to the "prior predictive distribution", which does not favor any particular class (OCS?).

0

1

Eli Sennesh

@EliSennesh

3 months

@xuanalogue Of course, wouldn't each neural net you train then count as one sample from the Gaussian process prior that's been evolved towards the marginal likelihood by training? So you can't be very Bayesian with it.

0

1