
Mikael Henaff
@HenaffMikael
Followers
1K
Following
982
Media
23
Statuses
204
Research Scientist at @MetaAI, previously postdoc at @MSFTResearch and PhD at @nyuniversity. All views my own.
Joined January 2019
Super stoked to share this work led by @proceduralia & @MartinKlissarov. Our method Motif uses LLMs to rank pairs of observation captions and synthesize dense intrinsic rewards specified by natural language. New SOTA on NetHack while being easily steerable. Paper+code in thread!.
Can reinforcement learning from AI feedback unlock new capabilities in AI agents?. Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement
2
3
36
Excited to share our @NeurIPSConf paper where we propose E3B--a new algorithm for exploration in varying environments. Paper: Website: E3B sets new SOTA for both MiniHack and reward-free exploration on Habitat. A thread [1/N]
3
24
106
I am looking for an intern for 2024 to work on the Cortex project in @AIatMeta 's Embodied AI team! Relevant skills include: experience with LLMs/VLMs, EAI simulators such as Habitat, and RL. DM or email at mikaelhenaff [at] meta [dot] com ✨ #AI #InternshipOpportunity #LLM.
3
17
77
Signed. Keeping models open is the best way to ensure high scientific standards for safety research and fair representation in AI development. via @mozilla.
1
9
64
New paper with @alfcnz and @ylecun , which we will be presenting at #iclr2019. We learn policies from purely observational data using uncertainty-regularized forward models. #DeepLearning #autonomousdriving Paper: Project site:
0
25
54
Exploration is well-studied for singleton MDPs, but many envs of interest change across episodes (i.e. procgen envs or embodied AI tasks). How should we explore in this case?. In our upcoming @icmlconf oral, we study this question. A thread. 1/N.
1
10
48
Very happy to share that our Motif work was accepted at #ICLR2024 :) come say hi in Vienna!.
Can reinforcement learning from AI feedback unlock new capabilities in AI agents?. Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement
0
1
29
A simple way to help with #Covid_19 and medicine generally is to donate spare computer time to biomedical researchers through projects like @foldingathome or @RosettaAtHome. Small contributions add up to make distributed peta/exaFLOP supercomputers!.
0
10
28
@_aidan_clark_ Step 1 doesn't have to be random: there is a large literature on directed exploration strategies, going back at least to Kearns and Singh's 2003 E^3 work that showed you can avoid the exponential sample complexity due to random random exploration.
2
1
25
The more I work with this env, the more its richness and complexity become apparent. Other than perception, it presents a hard challenge for nearly every other agentic capability, from long horizon planning and exploration to reasoning, memory and generalization.
The ultimate test? NetHack 🏰. This beast remains unsolved: the best model, o1-preview, achieved just 1.5% average progression. BALROG pushes boundaries, uncovering where LLMs/VLMs struggle the most. Will your model fare better? 🤔. They’re nowhere near capable enough yet!
0
1
23
The embodied AI team I'm part of at @MetaAI has multiple Research Scientist / Research Engineer positions open, come work with us ✨.
(1/6) The FAIR Embodied AI team at @MetaAI has multiple full-time openings! If you’re interested in cutting-edge research in AI for robotics, AR and VR, and sharing it with the world, read on. 🧵.
0
3
21
Also feel free to reach out if you want to grab coffee and chat about RL, exploration, generalization, LLMs for decision making, or anything else :) #ICML2023.
0
2
18
Stoked about this new benchmark for long-horizon planning, intrinsic motivation, procedural generalization and memory.
I’m excited to announce Craftax, a new benchmark for open-ended RL!. ⚔️ Extends the popular Crafter benchmark with Nethack-like dungeons.⚡Implemented entirely in Jax, achieving speedups of over 100x.1/
0
2
17
This is a very exciting dataset - stochastic policies/dynamics, large action space, partial observability, rich dynamics, *very* large scale while still enabling fast experiments. Can't wait to start playing with it and hope others do too!.
Delighted to present the NetHack Learning Dataset (NLD) at #NeurIPS2022 next week!. NLD is a new large-scale dataset for NetHack and MiniHack, aimed at supercharging research in offline RL, learning from observations, and imitation learning. 1/
0
4
15
Interview of @sharathraparthy discussing our recent work showing that transformers can in-context learn new sequential decision-making tasks in new environments. Check it out!.
Episode 48: Sharath Chandra Raparthy.@sharathraparthy (AI Resident at @AIatMeta) on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more! .
1
1
14
Latest work where we present OpenEQA, a modern embodied Q&A benchmark which tests multiple capabilities such as spatial reasoning, object recognition and world knowledge on which SOTA VLMS like GPT4V/Claude/Gemini fail. A new challenge for embodied AI! To be presented @CVPR.
Today we’re releasing OpenEQA — the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?”. More details ➡️
0
6
15
In Hawaii for #ICML2023, presenting two works Tuesday:. - A Study of Global and Episodic Bonuses in Contextual MDPs (poster at 2pm, oral at 6:10 pm). - Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories (poster at 11am). Hope to see you there :)
0
3
14
We are hiring a research intern for next year - if you would like to work on hierarchical RL, world models, modular networks and related topics with @shagunsodhani, myself and other researchers at FAIR please reach out! :).
0
0
14
Presenting our E3B work on exploration in changing environments at #NeurIPS at 11 am NOLA time in Hall J #105. come by and say hi! with @robertarail @MinqiJiang @_rockt
0
3
12
@_rockt @HeinrichKuttler @_samvelyan @erichammy you might be interested, this method is able to make progress on the Oracle task without demos (although sometimes in unexpected ways ;)).
3
0
8
Nice article in @techreview about our paper on model-based RL with uncertainty regularization for #autonomousdriving.
Reinforcement learning makes mistakes as it learns. That's fine when playing a board game. It's, erm, not great in a life-or-death situation.
0
5
11
@EugeneVinitsky The difference between algorithms that explore efficiently vs. not is essentially polynomial vs. exponential sample complexity (itself a lower bound on compute complexity). Imo more compute can crack some harder poly problems but will eventually hit a wall with exponential ones:).
3
0
10
@goodfellow_ian Really sorry to hear this. I had a bad case of LC as well in 2020 and few understand how brutal it is. Are you sure POTS is the main culprit? Asking because I had that diagnosis too but it later turned out to be wrong. This ended up helping me:
3
1
9
@patrickmineault @ylecun End to end memory networks in 2015 ( by @tesatory were an important precursor in the sense that like the transformer (and unlike the NTM), they maintain the sequence structure and perform multiple layers of attention over it.
0
2
10
New work led by @sharathraparthy and jointly with @robertarail @erichammy @_robertkirk showing that one can in-context learn completely *new tasks* on *new environments* via large-scale pretraining and few shot examples. To be presented at upcoming @NeurIPSConf FMDM workshop!.
🚨 🚨 !!New Paper Alert!! 🚨 🚨. How can we train agents that learn new tasks (with different states, actions, dynamics and reward functions) from only a few demonstrations and no weight updates?. In-context learning to the rescue!. In our new paper, we show that by training
0
2
9
Takes me back to my days as a starry-eyed master's student, when Pytorch's grandparent Lush was still used in @ylecun 's lab <3 . Lush was actually the first programming language I seriously learned (I'd been studying math until then). Such fond memories counting parentheses!.
I wrote two blog posts about SN, Léon Bottou and @ylecun's 1988 Simulateur de Neurones. One is an English translation of the original paper, for which I've reproduced the figures. The other is a tutorial on how to run their code on Apple silicon.
1
0
8
@jsuarez5341 Procedural generation or settings where the environment changes across episodes. Exploration operates very differently in that setting and a lot of algorithms for static MDPs fail.
0
0
7
@akbirkhan @MetaAI definitely possible in NYC, would have to see about London. feel free to apply here and send your cv to mikaelhenaff@meta.com :).
0
0
7
It was also a pleasure working with @shagunsodhani @robertarail @yayitsamyzhang and Pascal Vincent on this project!.
0
0
5
Nice opportunity to work with some great researchers!.
Our group has multiple openings for internships at FAIR London (@MetaAI). I’m looking for someone to work on language models + decision making e.g. augmenting LMs with actions / tools / goals, interactive / open-ended learning for LMs, or RLHF. Apply at
0
0
7
@HaqueIshfaq Minihack is quite nice, there are lots of tasks and many of them are sparse reward, and it has the additional interesting twist of being procedurally generated. We have some code to train a variety of exploration algorithms here:
1
1
6
High performing, open source VLMs and smol LLMs now available. nature is healing🌱.
📣 Introducing Llama 3.2: Lightweight models for edge devices, vision models and more!. What’s new?.• Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for @Arm, @MediaTek & @Qualcomm on day one. •
0
0
7
@_rockt @NetHack_LE 'nutritiously hard', sounds like a juicy problem and a hard nut to crack ;).
1
0
5
@TongzhouWang I think reorganizing information can be seen as adding new information encoded in the reorganization scheme. For example, if you are reorganizing N bits of information with program P of length K bits, you are effectively adding K bits of new information.
1
0
6
@_rockt @jsuarez5341 @PaglieriDavide @NetHack_LE And for Nethack, |A| ~= 100, H >= 10000. Things change if we have a smart exploration algorithm though, which is one of the reasons RL is interesting :).
0
0
4
@alfcnz When the turntable was invented, some people thought it was the end of music. Then people used it to make entirely new kinds of music (sampling, DJing etc). Human creativity always finds a way to express itself given the tools available :).
1
0
4
New paper accepted to #icml2020 - this takes steps towards bridging the theory-practice gap in RL by providing a provably sample-efficient algorithm for block MDPs which uses contrastive learning. Long version: . #ReinforcementLearning.
0
0
4
@HaqueIshfaq It has several of the minigrid envs ported but is a lot more challenging because count-based episodic bonuses do not work, we discuss some more here:
0
0
4
@FelixHill84 Sorry to hear about this Felix, but I'm glad things are starting to look up. I remember when we interned together in the early days which felt like a different world. I really admire both your scientific and human contributions to the field, wishing you well!.
1
0
5
@_rockt @jsuarez5341 @PaglieriDavide @NetHack_LE I think that "naive" tabula rasa RL cannot solve tasks like Nethack even with a universe sized computer. Consider a sparse reward task with |A|=10 actions and horizon H=100. Expected num samples with naive exploration is O(|A|^H) = O(10^100), more than # atoms in the universe :).
3
0
4
@SinghAyush2811 @MetaAI these are for students in a PhD program, but we sometimes have AI resident spots too which do not have this requirement. will advertise if so.
0
0
4
@CupiaBart Nice work, it's great to see interest in NetHack! If you're in this space you might be interested in a couple other repos:. In particular, Motif makes some progress on the very challenging Oracle task and uses SF as the RL env.
0
0
2
My friend Kelsey (aka @arcanelibrary ) designed a new D&D system inspired by the earlier versions of the game - simple, fast and deadly. I playtested the game during development and can't recommend it enough :) it's now available on Kickstarter!
0
0
3
@_aidan_clark_ Near-Optimal Reinforcement Learning in Polynomial Time - UPenn CIS This is based on the idea of novelty bonuses, which has also been extended to deep RL settings (e.g. RND, ICM, pseudocounts, etc).
0
0
1
@NicoBohlinger Thanks! We didn't compare to NGU but others have found it not to work well on procgen envs: One conceptual difference is that the elliptical bonus automatically normalizes wrt scale but NGU's KNN-based one doesn't which means a few features could dominate.
1
0
2
New paper at #NeurIPS2020 presenting PC-PG, a policy gradient algorithm that explores by growing a set of policies covering the set of possible states. Polynomial sample complexity in the linear case, and plays nice with modern deep RL methods.
0
0
2
We share our code - excited to see what people build with this! Many thanks to @qqyuzu @adityagrover_ @yayitsamyzhang @brandondamos for another fun collaboration.
0
1
2
@TongzhouWang Oh interesting, yes that sounds quite related! Yeah algorithmic complexity leads to cool thought experiments despite being not practical unless you have a universe sized computer ;p.
0
0
2
@UCL_DARK @MinqiJiang Big congrats Dr. @MinqiJiang !!! Very well deserved and it's been a pleasure collaborating during your time at FAIR. Looking forward to seeing what you come up with next :).
1
0
2
Very nice work by @mklissar on learning long-horizon exploratory behaviors using Laplacian eigenfunctions.
🎉I'm particularly excited to share this project I worked on under the guidance of @MarlosCMachado 🧙. We ask: what is the *right scaffold* for building temporal abstractions, from the ground up?. Website: It will be presented next week at #ICML2023 🏝️
0
0
2
Overall, this clarifies our understanding of how different exploration algorithms operate in CMDPs and opens up a number of exciting new directions. See paper for full details:.Thanks to my collaborators @MinqiJiang and @robertarail ! 15/N, N=15.
0
0
2
@cedcolas @robertarail @MinqiJiang @_rockt . for NGU's KNN-based bonus, if one of the dimensions has much larger scale than the others it can dominate the bonus due to euclidean distance being used.
0
0
2
@jsuarez5341 @_rockt @PaglieriDavide @NetHack_LE Basically, they bring complexity from O(|A|^H) to O(poly(env_complexity)). It's a huge improvement (exponential to polynomial), but may still not be enough if env_complexity is big. Adding priors through LLMs or otherwise can further reduce.
0
0
1
@healingfromlc Don't lose hope, I got it almost 3 years ago, was mostly non-functional for a year but it got better little by little & I am now doing MUCH better to the point where symptoms are mostly just an inconvenience. It will get better just v slowly and with lots of ups & downs.
0
0
1
@abreanac I quite liked this one, it covers the main ideas and appeals more to intuitions than rigorous proofs. The book by James Gleick is also great for an even more informal overview and historical perspective.
1
0
1
@TongzhouWang That's the premise of a nice short story called The Library of Babel, by Borges.
0
0
1
Excited to share recent work to be presented at #NeurIPS2019 : explicit exploration/exploitation using dynamics models. - Polynomial sample complexity bound in idealized setting, independent of number of states.- Practical algorithm using neural networks.
0
0
1
@petrenko_ai @CupiaBart It's great to see more interest in NetHack! We also used SF in a couple other repos which ran NetHack/MiniHack. Not sure if you remember but you answered several of my questions on the SF Discord which was very helpful :).
1
0
1
@goodfellow_ian Also fwiw my LC got better very slowly over the course of 3-4 years to the point it doesn’t affect me much any more. I think mindfulness and a healthy diet helped, but also just time. So don’t lose hope!.
1
0
1
@EduTrending @AIatMeta @NetHack_LE @InnerverseAI @firecrawl_dev @neo4j Hi Lindsay! The NLE is still being used for research, yes - stay tuned :) concerning the repo itself, I believe it is currently being maintained by @HeinrichKuttler here:
0
0
1
@TongzhouWang For example: a very short program can generate every possible book of N words, including works of genius and undiscovered scientific truths. But now, finding such needles in the haystack requires inputting information into the system. .
2
0
1
@WeLoveDogsHNL @RosettaAtHome @foldingathome That's awesome, 15 years is a lot of number crunching!.
0
0
1
@jsuarez5341 @_rockt @PaglieriDavide @NetHack_LE Novelty-based exploration algorithms are able to solve problems of the type I described with fairly minimal assumptions and sample complexity ~ polynomial in some measure of env complexity, only by env interaction. These include count-based bonuses and more recent stuff like RND.
2
0
1
@cedcolas @robertarail @MinqiJiang @_rockt Thanks! we didn't compare to NGU but others have reported it not to work well on procgen envs (. A conceptual advantage of the elliptical bonus is that it automatically adjusts the scale over each dimension. .
1
0
1