Anastasios Gerontopoulos Profile
Anastasios Gerontopoulos

@NasosGer

Followers
75
Following
72
Media
5
Statuses
17

PhD Fellow, Archimedes Unit | UOC

Heraklion, Greece
Joined May 2025
Don't wanna be here? Send us removal request.
@NasosGer
Anastasios Gerontopoulos
2 months
1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup:.• Short-term focus.• Struggles with long-range decisions.• Weaker supervision. Prior methods add complexity (extra layers).🔑 Our fix? Register tokens—elegant and powerful
Tweet media one
3
17
135
@NasosGer
Anastasios Gerontopoulos
4 days
RT @abursuc: Nice trick for fine-tuning with multi-token prediction without architecture changes: interleave learnable register tokens into….
0
3
0
@NasosGer
Anastasios Gerontopoulos
24 days
RT @SpyrosGidaris: I am at #CVPR2025 this week in Nashville!. Presenting "Advancing Semantic Future Prediction through Multimodal Visual Se….
0
4
0
@NasosGer
Anastasios Gerontopoulos
1 month
RT @_vaishnavh: 📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue:. → LLMs are limited in cre….
0
36
0
@NasosGer
Anastasios Gerontopoulos
2 months
RT @SpyrosGidaris: Better LLM training? @GregorBachmann1 & @_vaishnavh showed next-token prediction causes shortcut learning. A fix? Multi-….
0
5
0
@NasosGer
Anastasios Gerontopoulos
2 months
10/n.Joint work with @SpyrosGidaris and Nikos Komodakis. This research was conducted during my first year as a PhD fellow of ARCHIMEDES Research on AI, Data Science, and Algorithms. #MuToR #Transformers #GenerativeAI #MachineLearning #LLMs #LanguageModeling #DeepLearning.
0
0
6
@NasosGer
Anastasios Gerontopoulos
2 months
9/n Key Takeaways.MuToR offers a simple yet powerful approach for multi-token prediction:. ✅ Scalable prediction horizons.✅ Effective across modalities.✅ Foundation for token-based lookahead mechanisms.
1
0
4
@NasosGer
Anastasios Gerontopoulos
2 months
8/n.We even test MuToR on the star-graph pathfinding task, where next-token prediction with teacher-forcing models fail due to shortcut learning. MuToR succeeds, showing its ability to learn long-range structure and planning.
Tweet media one
1
0
7
@NasosGer
Anastasios Gerontopoulos
2 months
7/n 🖼️ MuToR for Images.Our 2D extension brings multi-token prediction to autoregressive image generation:. ✅ Better samples: Outperforms next-token prediction on both FID and IS.✅ Efficient: Achieves comparable performance even with only a small number of register tokens
Tweet media one
Tweet media two
1
0
7
@NasosGer
Anastasios Gerontopoulos
2 months
6/n.📈 Key Results:.MuToR surpasses both standard next-token prediction and prior multi-token work across diverse benchmarks. ✅ Math reasoning & summarization tasks.✅ Model-agnostic: Effective across sizes.✅ LoRA-compatible: matches / exceeds full fine-tuning accuracy
Tweet media one
Tweet media two
Tweet media three
1
0
9
@NasosGer
Anastasios Gerontopoulos
2 months
5/n.We also adapt MuToR to images by modifying the offset sampling to respect the 2D image structure. This 2D extension enriches the training signal by capturing spatial dependencies inherent in visual data, while requiring no architectural changes.
Tweet media one
1
0
7
@NasosGer
Anastasios Gerontopoulos
2 months
4/n Why MuToR? (2/2). ✅ Negligible params (just a single learnable embedding for registers).✅ Scalable prediction horizons (training cost remains fixed regardless of prediction span).✅ Richer training signal.✅ Identical inference speed.
1
0
10
@NasosGer
Anastasios Gerontopoulos
2 months
3/n Why MuToR? (1/2). ✅ No architecture changes (unlike prior multi-token setups with extra transformer blocks).✅ Fully compatible with off-the-shelf pretrained LLMs.✅ Ideal for supervised finetuning (aligns multi-token training with next-token pretraining setup).
1
0
11
@NasosGer
Anastasios Gerontopoulos
2 months
2/n Meet MuToR — Multi-token prediction, simplified. 🔹 Training: registers (interleaved with regular tokens) predict future tokens several steps ahead for a richer learning signal.🔹 Inference: Registers are discarded—pure next-token prediction. Paper:
1
3
24