Yipeng Zhang
@yipengzz
Followers
34
Following
13
Media
4
Statuses
27
Joined October 2021
Join us to push the field forward! Please spread the word📣📣
🚨Announcing the World Modeling Workshop 2026 🚨 📅 When: Feb 4–6, 2026 📍Where: Mila (Montréal) + Online (free) 💡 What: Keynotes, Methods Deep Dive, and Tutorials 🌐 https://t.co/WukFtNON3o ✉️ worldmodel.mila@gmail.com 🧵 Details below:
0
3
6
Consider submitting your work! Already published studies, negative results, reproducibility reports, and preliminary findings are all welcome!
You have an ongoing project on world models? Want to meet fellow researchers and discuss your work? We extended our deadline by +2 weeks to the 1st November! (Abstract due on 25th Oct) Submit any work-in-progress negative results, or already published work ➡️
0
1
3
🚀 ICLR rush is over—share your ideas at World Modeling Workshop 2026! 🧩 Posters, early results, negative findings, big ideas & published work welcome (non-archival). 📅 Abs: Oct 10 | 📅 Full: Oct 17 (AoE) 👉 https://t.co/T3e7aimJOX
#WMW2026 #WorldModeling
openreview.net
Welcome to the OpenReview homepage for WMW 2026 Workshop
0
2
9
NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!
22
103
787
🚨 We’re honored to host Prof. Jürgen Schmidhuber (The Swiss AI Lab & KAUST) at the World Modeling Workshop 2026! ✨ A pioneer of modern AI, Prof. @SchmidhuberAI has made influential contributions that shaped the field — we’re thrilled to welcome him. 🌐 https://t.co/inI2YV2Dsl
0
14
104
Zero rewards after tons of RL training? 😞 Before using dense rewards or incentivizing exploration, try changing the data. Adding easier instances of the task can unlock RL training. 🔓📈To know more checkout our blog post here: https://t.co/BPErVcLmP8. Keep reading 🧵(1/n)
spiffy-airbus-472.notion.site
Jatin Prakash* (NYU), Anirudh Buvanesh* (MILA) (* order decided through np.random.randint(2))
2
31
101
Preprint Alert 🚀 Multi-agent reinforcement learning (MARL) often assumes that agents know when other agents cooperate with them. But for humans, this isn’t always true. Example, plains indigenous groups used to leave resources for others to use at effigies called Manitokan. 1/8
2
12
34
Super stoked to share my first first-author paper that introduces a hybrid architecture approach for real-time neural decoding. It's been a lot of work, but happy to showcase some very cool results!
New preprint! 🧠🤖 How do we build neural decoders that are: ⚡️ fast enough for real-time use 🎯 accurate across diverse tasks 🌍 generalizable to new sessions, subjects, and species? We present POSSM, a hybrid SSM architecture that optimizes for all three of these axes! 🧵1/7
1
6
33
Is there a universal strategy to turn any generative model—GANs, VAEs, diffusion models, or flows—into a conditional sampler, or finetuned to optimize a reward function? Yes! Outsourced Diffusion Sampling (ODS) accepted to @icmlconf , does exactly that!
2
26
49
Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵 https://t.co/62OVigYWpt 1/N
2
26
86
🚨 Preprint Alert 🚀 📄 seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models https://t.co/vJaFyoQZvV Can we simultaneously learn both transformation-invariant and transformation-equivariant representations with self-supervised learning (SSL)?
arxiv.org
Current self-supervised algorithms commonly rely on transformations such as data augmentation and masking to learn visual representations. This is achieved by enforcing invariance or equivariance...
3
11
37
How can we make recommender systems more transparent and controllable without sacrificing quality? Introducing TEARS a scrutable RS that replaces numerical user profiles with editable text summaries (accepted @TheWebConf). https://t.co/CwoVIRNNnI 1/🧵
arxiv.org
Traditional recommender systems rely on high-dimensional (latent) embeddings for modeling user-item interactions, often resulting in opaque representations that lack interpretability. Moreover,...
2
11
24
Learned optimizers can’t generalize to large unseen tasks…. Until now! Excited to present μLO: Compute-Efficient Meta-Generalization of Learned Optimizers! Don’t miss my talk about it next Sunday at the OPT2024 Neurips Workshop :) 🧵 https://t.co/ysEWwRe9Hf 1/N
2
32
115
Introducing a framework for end-to-end discovery of data structures—no predefined algorithms or hand-tuning needed. Work led by Omar Salemohamed. More details below. https://t.co/lFb2kn2NpE
1
8
17
Come study with us at Mila! I will be looking for new students to work with. Our current projects explore continual learning, modularity, scrutability, algorithm discovery, AI for law (reasoning), invariances, and decision-making...
Mila's annual supervision request process opens on October 15 to receive MSc and PhD applications for Fall 2025 admission! Join our community! More information here https://t.co/r01eLcXtZw
0
7
33
Happy to share the first paper of my master's! Big kudos to my very cool co-authors: @zek3r, @tomjiralerspong, Alex Payeur, @mattperich, Luca Mazzucato, and @g_lajoie_
1
6
15
Can we perform unbiased bayesian posterior inference with a diffusion model prior? We propose Relative Trajectory Balance (RTB) which allows us to directly optimize for this posterior model. We apply this to several tasks in image, language and control!🧵 https://t.co/h7elRgivcC
2
20
69
How can we generate interesting edge cases to test our autonomous vehicles in simulation? We propose CtRL-Sim, a novel framework for closed-loop behaviour simulation that enables fine-grained control over agent behaviours. 🧵 1/8 https://t.co/l2EW6JVLYT
1
14
36
Check out our full paper on arXiv for more details. Code coming soon. https://t.co/9M7KhMnZO2 See you in Pisa🇮🇹! 🧵/🧵
0
1
1