Michael C. Mozer
@mc_mozer
Followers
725
Following
13
Media
3
Statuses
19
Research Scientist, Google Brain now DeepMind where cognitive science and machine learning meet
San Francisco, CA
Joined January 2022
[4/4] Details and results found at https://t.co/iX8HDE3FbZ (Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production). Joint work with @agalashov , @rosemary_ke, @caoyuan33, @_vaishnavh, and Matt Jones.
arxiv.org
We explore a class of supervised training objectives that allow a language model to dynamically and autonomously scale the number of compute steps used for each input token. For any token, the...
0
0
6
[3/4] To train the model to calibrate its uncertainty and use <don't know> outputs judiciously, we frame the selection of each output token as a sequential-decision problem with a time penalty. We refer to the class of methods as โCatch Your Breathโ losses.
0
0
5
[2/4] The model can request additional compute steps for any token by emitting a <don't know> output. If the model is granted a delay, a <pause> token is inserted at the next input step, providing the model with additional compute resources to generate an output.
0
0
4
[1/4] As you read words in this text, your brain adjusts fixation durations to facilitate comprehension. Inspired by human reading behavior, we propose a supervised objective that trains an LLM to dynamically determine the number of compute steps for each input token.
4
10
25
Happy to announce that our work has been accepted to workshops on Multi-turn Interactions and Embodied World Models at #NeurIPS2025! Frontier foundation models are incredible, but how well can they explore in interactive environments? Paper๐ https://t.co/8Q9j1VMTYv ๐งต1/13
1
5
23
๐To appear in the MechInterp Workshop @ #NeurIPS2025 ๐ Paper: https://t.co/fJS0eripxX How do language models (LMs) form representation of new tasks, during in-context learning? We study different types of task representations, and find that they evolve in distinct ways. ๐งต1/7
1
15
105
@adrian_weller @DavidSKrueger @gkdziugaite @mc_mozer @Eleni30fillou [๐9/9] Check out our paper for more details. Paper: https://t.co/7HD4CTMprB Code:
github.com
Official code for the paper: "From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization" - shoaibahmed/vision_relearning
0
1
6
[๐1/9] Does machine unlearning truly erase data influence? Our new paper reveals a critical insight: 'forgotten' information often isn't goneโit's merely dormant, and easily recovered by fine-tuning on just the retain set.
2
11
51
We are announcing the launch of Airial Travelโs open-to-all beta version for desktop today. Airial is your personal travel agent with AI superpowers which makes planning and booking trips as easy as dreaming them up. https://t.co/KKO8D5XnEn Me and Sanjeev co-founded Airial
9
12
45
Excited to present "Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery" at #NeurIPS2024! TL;DR: Our model, SynCx, greatly simplifies the inductive biases and training procedures of current state-of-the-art synchrony models. Thread ๐ 1/x.
2
41
165
The ability to properly contextualize is a core competency of LLMs, yet even the best models sometimes struggle. In a new preprint, we use #MechanisticInterpretability techniques to propose an explanation for contextualization errors: the LLM Race Conditions Hypothesis. [1/9]
5
15
103
๐ New LLM Research ๐ Conventional wisdom says that deep neural networks suffer from catastrophic forgetting as we train them on a sequence of data points with distribution shifts. But conventions are meant to be challenged! In our recent paper led by @YanlaiYang, we discovered
3
40
217
Nature Comms paper: Subtle adversarial image manipulations influence both human and machine perception! We show that adversarial attacks against computer vision models also transfer (weakly) to humans, even when the attack magnitude is small. https://t.co/O7skDZe6zU
12
89
386
1/ Today we are excited to introduce Phenaki: https://t.co/7xkcoeuXwB, short-link-to-paper, a model for generating videos from text, with prompts that can change over time, and that is able to generate videos that can be as long as multiple minutes!
36
393
2K
Two important breakthroughs from @GoogleAI this week - Imagen Video, a new text-conditioned video diffusion model that generates 1280x768 24fps HD video. And Phenaki, a model which generates long coherent videos for a sequence of text prompts. https://t.co/nTs67r21Sf
56
283
2K
We are excited to make the jump to complex real-world data with this class of models โ and about the potential that slot-based models have for reducing the need for detailed human supervision when learning about the physical world. 6/7
1
1
6
Excited to share our work on self-supervised video object representation learning: We introduce SAVi++, a slot-based video model that โ for the first time โ scales to Waymo Open driving scenes w/o direct supervision. ๐ฅ๏ธ https://t.co/eBAW2ijs6c ๐ https://t.co/tbjZWgdQEK 1/7
3
33
174
Overcoming temptation: Incentive design for intertemporal choice https://t.co/SalyyRQHpd We use AI models to help individuals adhere to long-term goals (e.g., retirement savings, weight loss) and avoid giving in to temptation.
1
5
14