Wojtek Masarczyk Profile
Wojtek Masarczyk

@e7mul

Followers
72
Following
49
Media
9
Statuses
1K

Usually teaching neural networks continually Sometimes PhD Student @ Warsaw Uni of Technology

Poland
Joined November 2016
Don't wanna be here? Send us removal request.
@e7mul
Wojtek Masarczyk
2 months
1/7 If Andrew Ng is right that the LR is the most important ML hyperparam, it's got some competition! We show that the softmax temperature is a game-changer in crafting NN representations. Often overlooked, it quietly governs generalization, collapse, and compression. A thread 👇
Tweet media one
2
12
13
@e7mul
Wojtek Masarczyk
2 months
I hope you've found this thread helpful. Follow me @e7mul for more. Like/Repost the quote below if you can:.
@e7mul
Wojtek Masarczyk
2 months
1/7 If Andrew Ng is right that the LR is the most important ML hyperparam, it's got some competition! We show that the softmax temperature is a game-changer in crafting NN representations. Often overlooked, it quietly governs generalization, collapse, and compression. A thread 👇
Tweet media one
0
0
0
@e7mul
Wojtek Masarczyk
2 months
Want the deep dive? Check the paper on arxiv: 2506.01562.Moral of the story? Stop tuning LR first—experiment with temperature today. And if you’ve seen temp save (or ruin) your model, share below! 👇
Tweet media one
1
1
2
@e7mul
Wojtek Masarczyk
2 months
Huge thanks to the fantastic team for this collaborative effort: @MatOstasze, @AurelienLucchi, @tscheng516, @tomasztrzcinsk1, and Razvan Pascanu!. Also, thanks to @EhsanImanii and @PiotrRMilos for laying the foundation for this work!.
1
0
2
@e7mul
Wojtek Masarczyk
2 months
7/7 But you're not Yann LeCun, and you don't care about collapse. Hear this! Want better OOD generalization? Train with low temp & avoid collapse. Boost OOD detection? Raise the temp and maximize the collapse! These tasks are at odds and temperature gives you a control knob!
Tweet media one
1
0
0
@e7mul
Wojtek Masarczyk
2 months
6/7 Rank deficit bias is an NN's tendency to find correct solutions with a rank lower than the number of classes. It breaks our intuitions about NN representations and shows that there are solutions of complexities far lower than predicted by Neural Collapse!🤯.
1
0
0
@e7mul
Wojtek Masarczyk
2 months
5/7 NNs align singular vectors hierarchically. Top vectors align early, creating a highway for information flow, but the remaining vectors lag behind, leading to representation collapse where only the top directions thrive. This gives rise to a novel phenomenon: rank deficit bias.
1
0
0
@e7mul
Wojtek Masarczyk
2 months
4/7 NNs found a clever way to boost the product norm of two matrices without increasing the norm of any of them. How? By aligning their singular subspaces. When repeated across multiple layers, it unlocks exponential growth of the logits norm! 🚀 But there's a catch.
1
0
0
@e7mul
Wojtek Masarczyk
2 months
3/7 High temperature decreases logits norm and makes softmax output almost uniform distribution for each sample -- hard to learn anything if everything looks the same. To break this, NNs find a clever way to increase the logits norm and break the symmetry. Can you guess it? 🧠.
1
0
0
@e7mul
Wojtek Masarczyk
2 months
2/7 Softmax does more than squish logits into probabilities—it's a true sculptor of representation! ⚒️ The magic lies in the interplay of logits norm and softmax temperature, enhancing or neutralizing each other.
1
0
0
@e7mul
Wojtek Masarczyk
6 months
RT @bartoszcyw: 🔥 New Paper!. How can sparse autoencoders (SAEs) applied to diffusion models help us solve real-world challenges?. 🚀 Introd….
0
53
0
@e7mul
Wojtek Masarczyk
11 months
RT @NousResearch: What if you could use all the computing power in the world to train a shared, open source AI model?. Preliminary report:….
0
582
0
@e7mul
Wojtek Masarczyk
1 year
RT @AurelienLucchi: My group has multiple openings both for PhD and Post-doc positions to work in the area of optimization for ML, and deep….
0
63
0
@e7mul
Wojtek Masarczyk
1 year
RT @IAmTimNguyen: Excited that my new paper on understanding LLMs is out, pushing how far we can describe LLM predictions via simple statis….
0
178
0
@e7mul
Wojtek Masarczyk
1 year
RT @hardmaru: Many people start attacking a problem by deploying the most sophisticated method possible with the belief that it will lead t….
0
65
0
@e7mul
Wojtek Masarczyk
1 year
RT @SebastienBubeck: Every day I witness the AI revolution in action, and every day I see 1 or 2 questions that would deserve an entire PhD….
0
37
0
@e7mul
Wojtek Masarczyk
2 years
RT @docmilanfar: Don't let low-order statistics fool you
0
1K
0
@e7mul
Wojtek Masarczyk
2 years
I'll be at #NeurIPS2023 ✈️ from Mon until Sat. Happy to chat about:.- repr. learning & its impact on continual learning.- all the bold ideas about why these overparametrized models generalize at all 😅.Find me around my posters (Thu&Fri) and during @unireps!.
0
3
9
@e7mul
Wojtek Masarczyk
2 years
Today, I'll present the Tunnel Effect paper on RL Sofa at MILA, 3 PM (EST). Tune in to know if there is a light at the end of every tunnel and how to use it in your favor.
Tweet media one
0
2
13
@e7mul
Wojtek Masarczyk
2 years
RT @OwainEvans_UK: Does a language model trained on “A is B” generalize to “B is A”?.E.g. When trained only on “George Washington was the f….
0
666
0