@StasBekman This is a phase transition we see it in molecular simulations which are directly analogous. Loss: energy, entropy: int rho log rho: neural nets minimise the free energy so you can see states with similar free energy but different losses and jump between. Tweet added by Tim Duignan @TimothyDuignan

Tim Duignan

6 months

@StasBekman This is a phase transition we see it in molecular simulations which are directly analogous. Loss: energy, entropy: int rho log rho: neural nets minimise the free energy so you can see states with similar free energy but different losses and jump between.

Exact Phase Transitions in Deep Learning

This work reports deep-learning-unique first-order and second-order phase transitions, whose phenomenology closely follows that in statistical physics. In particular, we prove that the competition...

arxiv.org

1

0

19

Stas Bekman

@StasBekman

6 months

I'm trying to understand this early grokking phenomenon (the training loss improves in a single burst). If you remember earlier I shared this grokking moment I was just trying llama-2-7b again with a totally different dataset and this time 8x 8xA100…

18

20

166

Stas Bekman

@StasBekman

6 months

@TimothyDuignan Thank you for sharing, Tim!

0

Replies