Kajetan Schweighofer @kschweig_ X Profile

Kajetan Schweighofer

@kschweig_

Followers

341

Following

499

Media

15

Statuses

250

Ellis PhD student @ JKU Linz, Institute for Machine Learning.

Joined November 2010

Don't wanna be here? Send us removal request.

Tom Yeh

@ProfTomYeh

1 month

I still remember back in grad school. My friend in NLP used to show off, bragging that he had LSTM all figured out. I envied him. Fortunately, my field was Computer Vision. I could survive just knowing my SVMs. In 2024, the inventor of LSTM himself is finally back with the

16

125

1K

LittleBlackSheep

@PhilipMWinter

1 month

After 3 years of writing, my first book is finally out 🎉It’s a quite unique and deep story with a bunch of interesting characters and themes. Make sure to grab the FREE e-book version until next week and spread the word if you find it as inspiring as I do🌈

2

8

François Chollet

@fchollet

1 month

I don't think you can call a system intelligent if it can't estimate its own uncertainty, question its own beliefs, and come up with experiments to sharpen what it is least sure about.

15

14

206

Rohan Paul

@rohanpaul_ai

2 months

The paper shows that xLSTM scales better than Transformers and keeps time linear as prompts get longer. So xLSTM or variants might become a serious alternative to Transformers, especially for long input scenarios and inference efficiency. At 16K context, xLSTM cuts time to

3

17

95

Yibo Li

@YiboL6378

2 months

🚀 Excited to share our work “ConfTuner: Training Large Language Models to Express Their Confidence Verbally” accepted at #NeurIPS2025! Arxiv: https://t.co/xAQywt82hb Code and checkpoints: https://t.co/oxJSW5Jzjx

5

4

10

Korbinian Poeppel

@KorbiPoeppel

2 months

So it's proven now: xLSTM is better than Transformers. In rigorous scaling laws. While having linear inference complexity. You should switch! Great work from @maxmbeck and colleagues!

Maximilian Beck✈️NeurIPS‘25

@maxmbeck

2 months

🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread

0

1

6

Günter Klambauer

@gklambauer

2 months

Rule of thumb: - if you want to waste a lot of money&energy, use Transformers. - if you want to have an equally good LLM, but save a lot of money&energy, use xLSTM.

Maximilian Beck✈️NeurIPS‘25

@maxmbeck

2 months

🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread

1

4

21

Kajetan Schweighofer

@kschweig_

2 months

The story repeats itself. We find that xLSTM performs better than Transformers on moderate context lengths of e.g. 8k (see picture). However, xLSTM handles longer contexts better and the benefit over Transformers grows with context length - both for training and inference.

0

2

6

Kajetan Schweighofer

@kschweig_

2 months

For reference:

arxiv.org

Scaling laws play a central role in the success of Large Language Models (LLMs), enabling the prediction of model performance relative to compute budgets prior to training. While Transformers have...

1

0

1

Sepp Hochreiter

@HochreiterSepp

2 months

Breakthrough result: Scaling laws show xLSTMs Pareto-dominate Transformers. Training: Equal FLOPs → xLSTMs reach lower loss Equal loss → xLSTMs require fewer FLOPs Inference: Faster More energy-efficient More cost-effective with same performance Directly saves money.

Maximilian Beck✈️NeurIPS‘25

@maxmbeck

2 months

🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread

7

44

363

Kajetan Schweighofer

@kschweig_

2 months

The empire strikes back

Katie Everett

@_katieeverett

6 months

For architecture, Kaplan et al 2020 show LSTMs and Transformers have similar exponents on short contexts. Transformers handle long contexts better though.

1

2

10

Kajetan Schweighofer

@kschweig_

2 months

This is for a fixed context length of moderate size (8k). xLSTMs advantage keeps growing for longer contexts, both in terms of training and inference compute. Huge deal for future agentic systems that need to operate over long horizons while staying affordable.

Maximilian Beck✈️NeurIPS‘25

@maxmbeck

2 months

🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread

0

3

13

Maximilian Beck✈️NeurIPS‘25

@maxmbeck

2 months

Excited to share that Tiled Flash Linear Attention has been accepted to NeurIPS25 🤩

Maximilian Beck✈️NeurIPS‘25

@maxmbeck

8 months

Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧 We introduce ⚡️Tiled Flash Linear Attention (TFLA), ⚡️ A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating. We find TFLA is really fast! 🧵(1/11)

1

3

20

Andrea Santilli

@teelinsan

4 months

Uncertainty quantification (UQ) is key for safe, reliable LLMs... but are we evaluating it correctly? 🚨 Our ACL2025 paper finds a hidden flaw: if both UQ methods and correctness metrics are biased by the same factor (e.g., response length), evaluations get systematically skewed

1

17

47

Johannes Brandstetter

@jo_brandstetter

4 months

General relativity 🤝 neural fields This simulation of a black hole is coming from our neural networks 🚀 We introduce Einstein Fields, a compact NN representation for 4D numerical relativity. EinFields are designed to handle the tensorial properties of GR and its derivatives.

11

75

324

nuriaoliver

@nuriaoliver

5 months

The day has come! After intense months of work, involving a thousand stakeholders, the #AIAct #CodeofPractice for #GPAI models has been published! Thank you to all the participants in the process and especially the (vice)-chairs @Yoshua_Bengio @MarietjeSchaake @matthiassamwald

European Commission

@EU_Commission

5 months

General-purpose AI must be safe and transparent. The Code of Practice is now available. It is designed to help industry comply with the AI Act’s rules on general-purpose AI, which will enter into application on 2 August More info ↓

0

5

16

Johannes Brandstetter

@jo_brandstetter

5 months

We release AB-UPT, a novel method to scale neural surrogates to CFD meshes beyond 100 million of mesh cells. AB-UPT is extensively tested on the largest publicly available datasets. 📄 https://t.co/xGQxhU8PuJ 🤗 https://t.co/WIirIMyNNd 💻 https://t.co/VuXjboZ0Xo

1

16

68

Korbinian Poeppel

@KorbiPoeppel

5 months

Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: https://t.co/nU7626uHWK

4

43

138

Andreas Auer

@AndAuer

6 months

We’re excited to introduce TiRex — a pre-trained time series forecasting model based on an xLSTM architecture.

5

22

70

Sepp Hochreiter

@HochreiterSepp

6 months

Attention!! Our TiRex time series model, built on xLSTM, is topping all major international leaderboards. A European-developed model is leading the field—significantly ahead of U.S. competitors like Amazon, Datadog, Salesforce, and Google, as well as Chinese models from Alibaba.

Andreas Auer

@AndAuer

6 months

We’re excited to introduce TiRex — a pre-trained time series forecasting model based on an xLSTM architecture.

2

29

115