Walter Hugo Lopez Pinaya 🍍
@Warvito
Followers
1K
Following
5K
Media
49
Statuses
3K
Senior Research Engineer @synthesiaIO | Ex-Research Fellow @KingsCollegeLon Text-to-Video | Generative Models | Medical Imaging
London, UK
Joined October 2009
Google just dropped "Attention is all you need (V2)" This paper could solve AI's biggest problem: Catastrophic forgetting. When AI models learn something new, they tend to forget what they previously learned. Humans don't work this way, and now Google Research has a solution.
247
985
6K
MIT introduces "Back to Basics: Let Denoising Generative Models Denoise" Shows that simple, large-patch Transformers on pixels, dubbed JiTs (Just Image Transformers), can be strong generative models
3
81
539
We are excited to unveil HunyuanVideo 1.5, the strongest open-source video generation model. Built upon DiT architecture, it redefines the open-source SOTA for accessibility and performance.🚀🚀🚀 HunyuanVideo 1.5 delivers state-of-the-art visual quality and motion coherence
39
181
1K
I finally have a better understanding of Yann LeCun's JEPA approach and why he may have quit Meta! I think it might fix one of the most annoying, hacky parts of training foundation models. What if 90% of the tricks we use to train big AI models are just complicated workarounds
21
47
335
when using muon, you should be careful 1. what lr scaling rule should i use for muon? 2. how the scale of muon and adamw differs? (it depends on 1...) 3. what's optimal lr scale of muon? understanding spectral condition will help.. https://t.co/OVeBR1NGSv
https://t.co/oanCF3G4ce
Muon Optimizer Guide: Quick Start & Key Details https://t.co/n41aqjCFJU
5
16
198
Vecchio et al, "Φeat: Physically-Grounded Feature Representation" Foundational backbone, finetuned DINOv3, trained with synthetic renders of materials, EMA student-teacher training with multiple losses.
6
48
429
Huge! @TianhongLi6 & Kaiming He (inventor of ResNet) just Introduced JiT (Just image Transformers)! JiTs are simple large-patch Transformers that operate on raw pixels, no tokenizer, pre-training, or extra losses needed. By predicting clean data on the natural-data manifold,
8
118
759
Thanks @_akhaliq for introducing our MMaDA-Parallel ( https://t.co/IbsuR9m4tH), Parallel Multimodal Large Diffusion Language Models for Thinking-Aware Image Editing and Generation Paper: https://t.co/IbsuR9m4tH Code:
github.com
Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation" - tyfeld/MMaDA-Parallel
2
29
173
This is a phenomenal video by @jbhuang0604 explaining seminal papers in computer vision, including CLIP, SimCLR, DINO v1/v2/v3 in 15 minutes DINO is actually a brilliant idea, I found the decision of 65k neurons in the output head pretty interesting
14
125
1K
Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7
10
65
385
Flow Matching models often struggle to balance memorization and generalization. 😱 We set out to fix this — by using the geometry of the data manifold. Introducing Carré du Champ Flow Matching (CDCFM)🧑🎨🥖 — improving generalization without sacrificing sample quality.
11
63
436
(1/n) Can diffusion models simulate molecular dynamics instead of generating independent samples? In our NeurIPS2025 paper, we train energy-based diffusion models that can do both: - Generate independent samples - Learn the underlying potential 𝑼 🧵👇 https://t.co/TSurVY3YEl
12
140
838
TabTune makes tabular AI models easy to try, compare, and trust. It hides messy prep and gives 1 simple fit, predict, evaluate flow. Work on tables is messy because every model wants different preprocessing, training modes, and metrics. This paper's technique supports 7
5
4
14
Diffusion Transformers with Representation Autoencoders https://t.co/tg1XG46YoI
speakerdeck.com
https://arxiv.org/abs/2510.11690
0
37
248
🚀 Training 64K+ context LLMs on consumer GPUs? Now possible with Ulysses + Ring Attention! We’ve fused two sequence parallelism techniques in ModelScope SWIFT: ✅ Ulysses: Low-comm, head-split (but limited by # of attention heads) ✅ Ring Attention: Scales beyond head count
4
28
136
holy shit... Hugging Face cooked again! 🔥 they just dropped a free blog (BOOK) that covers the no-bs reality of building SOTA models. i haven't seen any lab/researcher go into the real decisions behind the LLM research and its nuances. this is literally a gem. Syllabus: →
25
206
2K
If you are interested in Diffusion models. And wish to understand them in depth. This might be the best resource out there!
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
7
52
681
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
44
443
2K
[CV] Accelerating Vision Transformers with Adaptive Patch Sizes R Choudhury, J Kim, J Park, E Yang... [CMU & KAIST] (2025) https://t.co/zsX7D5B30G
0
36
252