
Satoki I @ ICML
@SisForCollege
Followers
488
Following
2K
Media
46
Statuses
650
TokyoTech 25D Dept. of Computer Science | R.Yokota lab | DNN optimization site: https://t.co/3NoUYlliTa
Joined August 2018
RT @soumithchintala: considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by de….
0
56
0
RT @SN_INGE: BREAKING NEWS.Congratulations to Professor Shun-ichi Amari!. 2025 Kyoto Prize Laureates.
0
170
0
RT @anilkseth: 1/3 @geoffreyhinton once said that the future depends on some graduate student being suspicious of everything he says (via @….
0
109
0
RT @iclr_conf: Test of Time Winner. Adam: A Method for Stochastic Optimization.Diederik P. Kingma, Jimmy Ba. Adam revolutionized neural net….
0
49
0
RT @andrewgwils: Good research is mostly about knowing what questions to ask, not about answering questions that other people are asking.
0
50
0
RT @andrewgwils: My new paper "Deep Learning is Not So Mysterious or Different": Generalization behaviours in deep….
0
299
0
What does "the science of scaling" refer to here? What algorithms are they using? Based on Igor's past papers, muP seems a likely candidate for the science of scaling. However, leaks suggest OAI also uses muP. I'm curious about what they mean by it.
@GavinSBaker Algorithms are also important, and they become increasingly important as the model size increases. I suspect the main reason people haven’t been able to train a better model than Grok 3 is that they didn’t get all the details of the training right. There is a huge amount of.
2
0
1
It's nice to see top researchers gathering at an ISM that is somehow close to my home (though I'm not affiliated with ISM) 🤣.
Officially I moved to the Institute of Statistical Mathematics as an associate professor. This was not possible without the supports of my collaborators. I would like to thank all of them and further develop ML theory (+ more!) in the new environment.
1
0
5