Walter Hugo Lopez Pinaya 🍍 @Warvito X Profile

Walter Hugo Lopez Pinaya 🍍

@Warvito

Followers

1K

Following

5K

Media

49

Statuses

3K

Senior Research Engineer @synthesiaIO | Ex-Research Fellow @KingsCollegeLon Text-to-Video | Generative Models | Medical Imaging

London, UK

Joined October 2009

Don't wanna be here? Send us removal request.

Akshay 🚀

@akshay_pachaar

2 days

Google just dropped "Attention is all you need (V2)" This paper could solve AI's biggest problem: Catastrophic forgetting. When AI models learn something new, they tend to forget what they previously learned. Humans don't work this way, and now Google Research has a solution.

247

985

6K

DailyPapers

@HuggingPapers

2 days

MIT introduces "Back to Basics: Let Denoising Generative Models Denoise" Shows that simple, large-patch Transformers on pixels, dubbed JiTs (Just Image Transformers), can be strong generative models

3

81

539

Hunyuan

@TencentHunyuan

4 days

We are excited to unveil HunyuanVideo 1.5, the strongest open-source video generation model. Built upon DiT architecture, it redefines the open-source SOTA for accessibility and performance.🚀🚀🚀 HunyuanVideo 1.5 delivers state-of-the-art visual quality and motion coherence

39

181

1K

Carlos E. Perez

@IntuitMachine

6 days

I finally have a better understanding of Yann LeCun's JEPA approach and why he may have quit Meta! I think it might fix one of the most annoying, hacky parts of training foundation models. What if 90% of the tricks we use to train big AI models are just complicated workarounds

21

47

335

Seunghyun Seo

@SeunghyunSEO7

5 days

when using muon, you should be careful 1. what lr scaling rule should i use for muon? 2. how the scale of muon and adamw differs? (it depends on 1...) 3. what's optimal lr scale of muon? understanding spectral condition will help.. https://t.co/OVeBR1NGSv https://t.co/oanCF3G4ce

jianlin.su

@Jianlin_S

5 days

Muon Optimizer Guide: Quick Start & Key Details https://t.co/n41aqjCFJU

5

16

198

Kwang Moo Yi

@kwangmoo_yi

6 days

Vecchio et al, "Φeat: Physically-Grounded Feature Representation" Foundational backbone, finetuned DINOv3, trained with synthetic renders of materials, EMA student-teacher training with multiple losses.

6

48

429

François Fleuret

@francoisfleuret

6 days

https://t.co/ZhauSc5vHw

François Fleuret

@francoisfleuret

6 days

I went to bed with three runs where I removed the only hacky part in my model, making it mathematically correct. This is what the Gods told me during the night to continue my journey.

6

2

138

机器之心 JIQIZHIXIN

@jiqizhixin

7 days

Huge! @TianhongLi6 & Kaiming He (inventor of ResNet) just Introduced JiT (Just image Transformers)! JiTs are simple large-patch Transformers that operate on raw pixels, no tokenizer, pre-training, or extra losses needed. By predicting clean data on the natural-data manifold,

8

118

759

Ling Yang

@LingYang_PU

7 days

Thanks @_akhaliq for introducing our MMaDA-Parallel ( https://t.co/IbsuR9m4tH), Parallel Multimodal Large Diffusion Language Models for Thinking-Aware Image Editing and Generation Paper: https://t.co/IbsuR9m4tH Code:

github.com

Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation" - tyfeld/MMaDA-Parallel

AK

@_akhaliq

7 days

MMaDA-Parallel Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

2

29

173

Niels Rogge

@NielsRogge

8 days

This is a phenomenal video by @jbhuang0604 explaining seminal papers in computer vision, including CLIP, SimCLR, DINO v1/v2/v3 in 15 minutes DINO is actually a brilliant idea, I found the decision of 65k neurons in the output head pretty interesting

14

125

1K

Sean McLeish ✈️ NeurIPS

@SeanMcleish

13 days

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7

10

65

385

Jacob Bamberger

@jacobbamb

13 days

Flow Matching models often struggle to balance memorization and generalization. 😱 We set out to fix this — by using the geometry of the data manifold. Introducing Carré du Champ Flow Matching (CDCFM)🧑‍🎨🥖 — improving generalization without sacrificing sample quality.

11

63

436

Leon Klein

@leonklein26

15 days

(1/n) Can diffusion models simulate molecular dynamics instead of generating independent samples? In our NeurIPS2025 paper, we train energy-based diffusion models that can do both: - Generate independent samples - Learn the underlying potential 𝑼 🧵👇 https://t.co/TSurVY3YEl

12

140

838

Rohan Paul

@rohanpaul_ai

18 days

TabTune makes tabular AI models easy to try, compare, and trust. It hides messy prep and gives 1 simple fit, predict, evaluate flow. Work on tables is messy because every model wants different preprocessing, training modes, and metrics. This paper's technique supports 7

5

4

14

Shumpei Takezaki

@shumpeiMaxwell

20 days

Diffusion Transformers with Representation Autoencoders https://t.co/tg1XG46YoI

speakerdeck.com

https://arxiv.org/abs/2510.11690

0

37

248

ModelScope

@ModelScope2022

21 days

🚀 Training 64K+ context LLMs on consumer GPUs? Now possible with Ulysses + Ring Attention! We’ve fused two sequence parallelism techniques in ModelScope SWIFT: ✅ Ulysses: Low-comm, head-split (but limited by # of attention heads) ✅ Ring Attention: Scales beyond head count

4

28

136

ℏεsam

@Hesamation

25 days

holy shit... Hugging Face cooked again! 🔥 they just dropped a free blog (BOOK) that covers the no-bs reality of building SOTA models. i haven't seen any lab/researcher go into the real decisions behind the LLM research and its nuances. this is literally a gem. Syllabus: →

25

206

2K

Pramod Goyal

@goyal__pramod

26 days

If you are interested in Diffusion models. And wish to understand them in depth. This might be the best resource out there!

Chieh-Hsin (Jesse) Lai ✈️ NeurIPS

@JCJesseLai

27 days

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core

7

52

681

Chieh-Hsin (Jesse) Lai ✈️ NeurIPS

@JCJesseLai

27 days

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core

44

443

2K

fly51fly

@fly51fly

1 month

[CV] Accelerating Vision Transformers with Adaptive Patch Sizes R Choudhury, J Kim, J Park, E Yang... [CMU & KAIST] (2025) https://t.co/zsX7D5B30G

0

36

252