
Wojtek Masarczyk
@e7mul
Followers
72
Following
49
Media
9
Statuses
1K
Usually teaching neural networks continually Sometimes PhD Student @ Warsaw Uni of Technology
Poland
Joined November 2016
1/7 If Andrew Ng is right that the LR is the most important ML hyperparam, it's got some competition! We show that the softmax temperature is a game-changer in crafting NN representations. Often overlooked, it quietly governs generalization, collapse, and compression. A thread 👇
2
12
13
I hope you've found this thread helpful. Follow me @e7mul for more. Like/Repost the quote below if you can:.
1/7 If Andrew Ng is right that the LR is the most important ML hyperparam, it's got some competition! We show that the softmax temperature is a game-changer in crafting NN representations. Often overlooked, it quietly governs generalization, collapse, and compression. A thread 👇
0
0
0
Huge thanks to the fantastic team for this collaborative effort: @MatOstasze, @AurelienLucchi, @tscheng516, @tomasztrzcinsk1, and Razvan Pascanu!. Also, thanks to @EhsanImanii and @PiotrRMilos for laying the foundation for this work!.
1
0
2
RT @bartoszcyw: 🔥 New Paper!. How can sparse autoencoders (SAEs) applied to diffusion models help us solve real-world challenges?. 🚀 Introd….
0
53
0
RT @NousResearch: What if you could use all the computing power in the world to train a shared, open source AI model?. Preliminary report:….
0
582
0
RT @AurelienLucchi: My group has multiple openings both for PhD and Post-doc positions to work in the area of optimization for ML, and deep….
0
63
0
RT @IAmTimNguyen: Excited that my new paper on understanding LLMs is out, pushing how far we can describe LLM predictions via simple statis….
0
178
0
RT @SebastienBubeck: Every day I witness the AI revolution in action, and every day I see 1 or 2 questions that would deserve an entire PhD….
0
37
0
I'll be at #NeurIPS2023 ✈️ from Mon until Sat. Happy to chat about:.- repr. learning & its impact on continual learning.- all the bold ideas about why these overparametrized models generalize at all 😅.Find me around my posters (Thu&Fri) and during @unireps!.
0
3
9
RT @OwainEvans_UK: Does a language model trained on “A is B” generalize to “B is A”?.E.g. When trained only on “George Washington was the f….
0
666
0