
Julien Siems
@julien_siems
Followers
305
Following
504
Media
13
Statuses
76
PhD student advised by Frank Hutter working on linear RNNs and state-tracking.
Germany
Joined July 2022
RT @JakeMRobertson: We present a new approach to causal inference. Pre-trained on synthetic data, Do-PFN opens the door to a new domain: PF….
0
3
0
RT @riccardograzzi: 📖 (1/n) DeltaProduct's theory got an update!. 1) For any nₕ>1 (# of Householders), only 3 layers are needed to solve al….
0
2
0
⚡DeltaProduct update with new results:.- Characterization of DeltaProduct’s state-tracking ability.- Inspection of the hidden state’s effective rank sheds light on why DeltaProduct extrapolates better to longer sequences than DeltaNet. - Improved scaling analysis.And more!
1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!
0
12
54
RT @behrouz_ali: What makes attention the critical component for most advances in LLMs and what holds back long-term memory modules (RNNs)?….
0
135
0
RT @SonglinYang4: 📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, be….
0
88
0
RT @BlinkDL_AI: RWKV7-G1 "GooseOne" 🪿 1.5B release: pure RNN (attention-free) reasoning model, comparable with Qwen3 1.7B and fully multili….
0
33
0
RT @xiaolonw: Test-Time Training (TTT) is now on Video! And not just a 5-second video. We can generate a full 1-min video!. TTT module is a….
0
179
0
RT @maxmbeck: Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑🔧. We introduce. ⚡️Tiled Flash….
0
61
0
RT @riccardograzzi: @julien_siems @leloykun @jyo_pari In our DeltaProduct work we also add a bit of theory to DeltaNet, showing that it can….
0
5
0
8/9 This was a great project with @timurcarstensen , @ZelaArber , @FrankRHutter , @MPontil , and @riccardograzzi .Check out our Oral at the FM-Wild Workshop at @iclr_conf :.
1
0
19
3/9 Following @SonglinYang4 et al. (2024), DeltaNet can be seen as performing one gradient descent step per token on an associative recall loss, resulting in a rank-1 state-transition matrix.
1
0
15
RT @maxmbeck: 📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨. We optimized the architectu….
0
60
0