Anastasios Gerontopoulos @NasosGer X Profile

Anastasios Gerontopoulos

@NasosGer

Followers

75

Following

72

Media

5

Statuses

17

PhD Fellow, Archimedes Unit | UOC

Heraklion, Greece

Joined May 2025

Don't wanna be here? Send us removal request.

Anastasios Gerontopoulos

@NasosGer

2 months

1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup:.• Short-term focus.• Struggles with long-range decisions.• Weaker supervision. Prior methods add complexity (extra layers).🔑 Our fix? Register tokens—elegant and powerful

3

17

135

Anastasios Gerontopoulos

@NasosGer

4 days

RT @abursuc: Nice trick for fine-tuning with multi-token prediction without architecture changes: interleave learnable register tokens into….

0

3

0

Anastasios Gerontopoulos

@NasosGer

24 days

RT @SpyrosGidaris: I am at #CVPR2025 this week in Nashville!. Presenting "Advancing Semantic Future Prediction through Multimodal Visual Se….

0

4

0

Anastasios Gerontopoulos

@NasosGer

1 month

RT @_vaishnavh: 📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue:. → LLMs are limited in cre….

0

36

0

Anastasios Gerontopoulos

@NasosGer

2 months

RT @SpyrosGidaris: Better LLM training? @GregorBachmann1 & @_vaishnavh showed next-token prediction causes shortcut learning. A fix? Multi-….

0

5

0

Anastasios Gerontopoulos

@NasosGer

2 months

10/n.Joint work with @SpyrosGidaris and Nikos Komodakis. This research was conducted during my first year as a PhD fellow of ARCHIMEDES Research on AI, Data Science, and Algorithms. #MuToR #Transformers #GenerativeAI #MachineLearning #LLMs #LanguageModeling #DeepLearning.

0

6

Anastasios Gerontopoulos

@NasosGer

2 months

9/n Key Takeaways.MuToR offers a simple yet powerful approach for multi-token prediction:. ✅ Scalable prediction horizons.✅ Effective across modalities.✅ Foundation for token-based lookahead mechanisms.

1

0

4

Anastasios Gerontopoulos

@NasosGer

2 months

8/n.We even test MuToR on the star-graph pathfinding task, where next-token prediction with teacher-forcing models fail due to shortcut learning. MuToR succeeds, showing its ability to learn long-range structure and planning.

1

0

7

Anastasios Gerontopoulos

@NasosGer

2 months

7/n 🖼️ MuToR for Images.Our 2D extension brings multi-token prediction to autoregressive image generation:. ✅ Better samples: Outperforms next-token prediction on both FID and IS.✅ Efficient: Achieves comparable performance even with only a small number of register tokens

1

0

7

Anastasios Gerontopoulos

@NasosGer

2 months

6/n.📈 Key Results:.MuToR surpasses both standard next-token prediction and prior multi-token work across diverse benchmarks. ✅ Math reasoning & summarization tasks.✅ Model-agnostic: Effective across sizes.✅ LoRA-compatible: matches / exceeds full fine-tuning accuracy

1

0

9

Anastasios Gerontopoulos

@NasosGer

2 months

5/n.We also adapt MuToR to images by modifying the offset sampling to respect the 2D image structure. This 2D extension enriches the training signal by capturing spatial dependencies inherent in visual data, while requiring no architectural changes.

1

0

7

Anastasios Gerontopoulos

@NasosGer

2 months

4/n Why MuToR? (2/2). ✅ Negligible params (just a single learnable embedding for registers).✅ Scalable prediction horizons (training cost remains fixed regardless of prediction span).✅ Richer training signal.✅ Identical inference speed.

1

0

10

Anastasios Gerontopoulos

@NasosGer

2 months

3/n Why MuToR? (1/2). ✅ No architecture changes (unlike prior multi-token setups with extra transformer blocks).✅ Fully compatible with off-the-shelf pretrained LLMs.✅ Ideal for supervised finetuning (aligns multi-token training with next-token pretraining setup).

1

0

11

Anastasios Gerontopoulos

@NasosGer

2 months

2/n Meet MuToR — Multi-token prediction, simplified. 🔹 Training: registers (interleaved with regular tokens) predict future tokens several steps ahead for a richer learning signal.🔹 Inference: Registers are discarded—pure next-token prediction. Paper:

1

3

24