Kajetan Schweighofer Profile
Kajetan Schweighofer

@kschweig_

Followers
341
Following
499
Media
15
Statuses
250

Ellis PhD student @ JKU Linz, Institute for Machine Learning.

Joined November 2010
Don't wanna be here? Send us removal request.
@ProfTomYeh
Tom Yeh
1 month
I still remember back in grad school. My friend in NLP used to show off, bragging that he had LSTM all figured out. I envied him. Fortunately, my field was Computer Vision. I could survive just knowing my SVMs. In 2024, the inventor of LSTM himself is finally back with the
16
125
1K
@PhilipMWinter
LittleBlackSheep
1 month
After 3 years of writing, my first book is finally out 🎉It’s a quite unique and deep story with a bunch of interesting characters and themes. Make sure to grab the FREE e-book version until next week and spread the word if you find it as inspiring as I do🌈
2
2
8
@fchollet
François Chollet
1 month
I don't think you can call a system intelligent if it can't estimate its own uncertainty, question its own beliefs, and come up with experiments to sharpen what it is least sure about.
15
14
206
@rohanpaul_ai
Rohan Paul
2 months
The paper shows that xLSTM scales better than Transformers and keeps time linear as prompts get longer. So xLSTM or variants might become a serious alternative to Transformers, especially for long input scenarios and inference efficiency. At 16K context, xLSTM cuts time to
3
17
95
@YiboL6378
Yibo Li
2 months
🚀 Excited to share our work “ConfTuner: Training Large Language Models to Express Their Confidence Verbally” accepted at #NeurIPS2025! Arxiv: https://t.co/xAQywt82hb Code and checkpoints: https://t.co/oxJSW5Jzjx
5
4
10
@KorbiPoeppel
Korbinian Poeppel
2 months
So it's proven now: xLSTM is better than Transformers. In rigorous scaling laws. While having linear inference complexity. You should switch! Great work from @maxmbeck and colleagues!
@maxmbeck
Maximilian Beck✈️NeurIPS‘25
2 months
🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread
0
1
6
@gklambauer
Günter Klambauer
2 months
Rule of thumb: - if you want to waste a lot of money&energy, use Transformers. - if you want to have an equally good LLM, but save a lot of money&energy, use xLSTM.
@maxmbeck
Maximilian Beck✈️NeurIPS‘25
2 months
🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread
1
4
21
@kschweig_
Kajetan Schweighofer
2 months
The story repeats itself. We find that xLSTM performs better than Transformers on moderate context lengths of e.g. 8k (see picture). However, xLSTM handles longer contexts better and the benefit over Transformers grows with context length - both for training and inference.
0
2
6
@HochreiterSepp
Sepp Hochreiter
2 months
Breakthrough result: Scaling laws show xLSTMs Pareto-dominate Transformers. Training: Equal FLOPs → xLSTMs reach lower loss Equal loss → xLSTMs require fewer FLOPs Inference: Faster More energy-efficient More cost-effective with same performance Directly saves money.
@maxmbeck
Maximilian Beck✈️NeurIPS‘25
2 months
🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread
7
44
363
@kschweig_
Kajetan Schweighofer
2 months
The empire strikes back
@_katieeverett
Katie Everett
6 months
For architecture, Kaplan et al 2020 show LSTMs and Transformers have similar exponents on short contexts. Transformers handle long contexts better though.
1
2
10
@kschweig_
Kajetan Schweighofer
2 months
This is for a fixed context length of moderate size (8k). xLSTMs advantage keeps growing for longer contexts, both in terms of training and inference compute. Huge deal for future agentic systems that need to operate over long horizons while staying affordable.
@maxmbeck
Maximilian Beck✈️NeurIPS‘25
2 months
🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread
0
3
13
@maxmbeck
Maximilian Beck✈️NeurIPS‘25
2 months
Excited to share that Tiled Flash Linear Attention has been accepted to NeurIPS25 🤩
@maxmbeck
Maximilian Beck✈️NeurIPS‘25
8 months
Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧 We introduce ⚡️Tiled Flash Linear Attention (TFLA), ⚡️ A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating. We find TFLA is really fast! 🧵(1/11)
1
3
20
@teelinsan
Andrea Santilli
4 months
Uncertainty quantification (UQ) is key for safe, reliable LLMs... but are we evaluating it correctly? 🚨 Our ACL2025 paper finds a hidden flaw: if both UQ methods and correctness metrics are biased by the same factor (e.g., response length), evaluations get systematically skewed
1
17
47
@jo_brandstetter
Johannes Brandstetter
4 months
General relativity 🤝 neural fields This simulation of a black hole is coming from our neural networks 🚀 We introduce Einstein Fields, a compact NN representation for 4D numerical relativity. EinFields are designed to handle the tensorial properties of GR and its derivatives.
11
75
324
@nuriaoliver
nuriaoliver
5 months
The day has come! After intense months of work, involving a thousand stakeholders, the #AIAct #CodeofPractice for #GPAI models has been published! Thank you to all the participants in the process and especially the (vice)-chairs @Yoshua_Bengio @MarietjeSchaake @matthiassamwald
@EU_Commission
European Commission
5 months
General-purpose AI must be safe and transparent. The Code of Practice is now available. It is designed to help industry comply with the AI Act’s rules on general-purpose AI, which will enter into application on 2 August More info ↓
0
5
16
@jo_brandstetter
Johannes Brandstetter
5 months
We release AB-UPT, a novel method to scale neural surrogates to CFD meshes beyond 100 million of mesh cells. AB-UPT is extensively tested on the largest publicly available datasets. 📄 https://t.co/xGQxhU8PuJ 🤗 https://t.co/WIirIMyNNd 💻 https://t.co/VuXjboZ0Xo
1
16
68
@KorbiPoeppel
Korbinian Poeppel
5 months
Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: https://t.co/nU7626uHWK
4
43
138
@AndAuer
Andreas Auer
6 months
We’re excited to introduce TiRex — a pre-trained time series forecasting model based on an xLSTM architecture.
5
22
70
@HochreiterSepp
Sepp Hochreiter
6 months
Attention!! Our TiRex time series model, built on xLSTM, is topping all major international leaderboards. A European-developed model is leading the field—significantly ahead of U.S. competitors like Amazon, Datadog, Salesforce, and Google, as well as Chinese models from Alibaba.
@AndAuer
Andreas Auer
6 months
We’re excited to introduce TiRex — a pre-trained time series forecasting model based on an xLSTM architecture.
2
29
115