inverse_hessian Profile Banner
Michal Wilinski Profile
Michal Wilinski

@inverse_hessian

Followers
181
Following
1K
Media
10
Statuses
221

member of technical staff @ stealth & incoming phd student @SCSatCMU · bsc from @PUT_Poznan

Poland
Joined November 2016
Don't wanna be here? Send us removal request.
@inverse_hessian
Michal Wilinski
7 days
RT @jxmnop: *taps the sign*
Tweet media one
0
8K
0
@inverse_hessian
Michal Wilinski
12 days
RT @gm8xx8: μ-Parametrization for Mixture of Experts. µP-MoE extends µParameterization to MoE Transformers, enabling zero-shot learning rat….
0
15
0
@inverse_hessian
Michal Wilinski
12 days
RT @TimDarcet: hey we heard you liked dinov2 so we got you more of the same shit. dinov3 is like dinov2 in the sense that it's much better….
0
11
0
@inverse_hessian
Michal Wilinski
20 days
RT @cloneofsimo: Deepseek is great but @kuba_krj did this first in feb 2024.With explicit scaling law. Which was on top of @_aidan_clark_….
0
10
0
@inverse_hessian
Michal Wilinski
23 days
cool stuff to see at @kdd_news, including results from our senior thesis! #KDD2025.
@igstepka
Ignacy Stepka
23 days
This week I'm presenting some works at @kddconf in Toronto 🇨🇦. Let’s connect if you’re interested in privacy/gradient inversion attacks in federated learning, counterfactual explanations, or fairness and xai!. Here’s where you can find me:.
0
2
6
@inverse_hessian
Michal Wilinski
27 days
RT @kellerjordan0: New NanoGPT training speed record: 3.28 FineWeb val loss in 2.863 minutes on 8xH100. New record-holder: @.ClassicLarry o….
0
33
0
@inverse_hessian
Michal Wilinski
28 days
RT @marekkraft: We are opening a second @esa project at @PUT_Poznan - this time we're looking for student interns! If you're into space rob….
0
2
0
@inverse_hessian
Michal Wilinski
29 days
RT @willccbb: imagine if gas stations didn't tell you how many gallons you were getting because car mileage was a trade secret and the gas….
0
179
0
@inverse_hessian
Michal Wilinski
1 month
RT @mervenoyann: spent few hours integrating this to TRL for online methods 🤝🏻 the code itself isn't much but testing took time 🥲. works wh….
0
6
0
@inverse_hessian
Michal Wilinski
1 month
RT @huybery: We’ve updated Qwen3 and made excellent progress. The non‑reasoning model now delivers significant improvements across a wide r….
0
62
0
@inverse_hessian
Michal Wilinski
1 month
RT @lmthang: Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the Inte….
0
233
0
@inverse_hessian
Michal Wilinski
1 month
RT @Mihonarium: 🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closi….
0
198
0
@inverse_hessian
Michal Wilinski
1 month
RT @dtiapkin: If you're at #ICML2025, come say hi and learn about teacher hacking in distillation. See you at poster E-2706!.
0
1
0
@inverse_hessian
Michal Wilinski
1 month
RT @PiotrRMilos: Today, come and see us at the poster session; East Exhibition Hall. - Joint MoE Scaling Laws (E-2609); tl;dr MoE can be me….
0
5
0
@inverse_hessian
Michal Wilinski
1 month
📄 Paper: 💻 Code: Drop by our poster today at #ICML2025, and let’s chat more!
Tweet media one
1
0
2
@inverse_hessian
Michal Wilinski
1 month
Armed with this understanding, we prune redundant layers from the model. The result? Significantly faster inference and improved efficiency with good performance across various time series tasks!
Tweet media one
1
1
1
@inverse_hessian
Michal Wilinski
1 month
In the second part of our work, we leverage insights from learned representations to boost TSFM efficiency. We start by assessing redundancy across model layers, examining similarities in their learned representations.
Tweet media one
1
0
0
@inverse_hessian
Michal Wilinski
1 month
Next, we identify meaningful concept vectors corresponding to these temporal properties, allowing us to actively steer model outputs!
Tweet media one
1
0
0