Mikael Henaff @HenaffMikael profile

Mikael Henaff

@HenaffMikael

Followers

1K

Following

982

Media

23

Statuses

204

Research Scientist at @MetaAI, previously postdoc at @MSFTResearch and PhD at @nyuniversity. All views my own.

Joined January 2019

Don't wanna be here? Send us removal request.

Mikael Henaff

@HenaffMikael

2 years

Super stoked to share this work led by @proceduralia & @MartinKlissarov. Our method Motif uses LLMs to rank pairs of observation captions and synthesize dense intrinsic rewards specified by natural language. New SOTA on NetHack while being easily steerable. Paper+code in thread!.

Pierluca D'Oro

@proceduralia

2 years

Can reinforcement learning from AI feedback unlock new capabilities in AI agents?. Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement

2

3

36

Mikael Henaff

@HenaffMikael

8 months

I'm looking for a PhD intern for next year! If you are interested in any combination of: intrinsic motivation, LLM/VLM-guided reward design, long-horizon tasks, hierarchical RL, NetHack, MineCraft, representation learning, I'd love to hear from you. Details below. .

4

33

250

Mikael Henaff

@HenaffMikael

2 years

Hiring a #research #intern for 2023 at FAIR (@MetaAI), if you're interested in working on exploration, generalization, imitation learning or hierarchical RL please get in touch :).

8

19

168

Mikael Henaff

@HenaffMikael

3 years

Excited to share our @NeurIPSConf paper where we propose E3B--a new algorithm for exploration in varying environments. Paper: Website: E3B sets new SOTA for both MiniHack and reward-free exploration on Habitat. A thread [1/N]

3

24

106

Mikael Henaff

@HenaffMikael

6 months

Excited to share our latest work ONI, which enables learning intrinsic rewards online without pre-collected data. We do this by annotating the agent's collected experience with an asynchronously hosted LLM server. Paper: Code:

4

16

91

Mikael Henaff

@HenaffMikael

1 year

I am looking for an intern for 2024 to work on the Cortex project in @AIatMeta 's Embodied AI team! Relevant skills include: experience with LLMs/VLMs, EAI simulators such as Habitat, and RL. DM or email at mikaelhenaff [at] meta [dot] com ✨ #AI #InternshipOpportunity #LLM.

3

17

77

Mikael Henaff

@HenaffMikael

2 years

Signed. Keeping models open is the best way to ensure high scientific standards for safety research and fair representation in AI development. via @mozilla.

1

9

64

Mikael Henaff

@HenaffMikael

6 years

New paper with @alfcnz and @ylecun , which we will be presenting at #iclr2019. We learn policies from purely observational data using uncertainty-regularized forward models. #DeepLearning #autonomousdriving Paper: Project site:

0

25

54

Mikael Henaff

@HenaffMikael

2 years

Exploration is well-studied for singleton MDPs, but many envs of interest change across episodes (i.e. procgen envs or embodied AI tasks). How should we explore in this case?. In our upcoming @icmlconf oral, we study this question. A thread. 1/N.

1

10

48

Mikael Henaff

@HenaffMikael

1 year

Very happy to share that our Motif work was accepted at #ICLR2024 :) come say hi in Vienna!.

Pierluca D'Oro

@proceduralia

2 years

Can reinforcement learning from AI feedback unlock new capabilities in AI agents?. Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement

0

1

29

Mikael Henaff

@HenaffMikael

5 years

A simple way to help with #Covid_19 and medicine generally is to donate spare computer time to biomedical researchers through projects like @foldingathome or @RosettaAtHome. Small contributions add up to make distributed peta/exaFLOP supercomputers!.

0

10

28

Mikael Henaff

@HenaffMikael

2 years

@_aidan_clark_ Step 1 doesn't have to be random: there is a large literature on directed exploration strategies, going back at least to Kearns and Singh's 2003 E^3 work that showed you can avoid the exponential sample complexity due to random random exploration.

2

1

25

Mikael Henaff

@HenaffMikael

6 months

The more I work with this env, the more its richness and complexity become apparent. Other than perception, it presents a hard challenge for nearly every other agentic capability, from long horizon planning and exploration to reasoning, memory and generalization.

Davide Paglieri

@PaglieriDavide

6 months

The ultimate test? NetHack 🏰. This beast remains unsolved: the best model, o1-preview, achieved just 1.5% average progression. BALROG pushes boundaries, uncovering where LLMs/VLMs struggle the most. Will your model fare better? 🤔. They’re nowhere near capable enough yet!

0

1

23

Mikael Henaff

@HenaffMikael

2 years

The embodied AI team I'm part of at @MetaAI has multiple Research Scientist / Research Engineer positions open, come work with us ✨.

Mrinal Kalakrishnan

@mkalakrishnan

2 years

(1/6) The FAIR Embodied AI team at @MetaAI has multiple full-time openings! If you’re interested in cutting-edge research in AI for robotics, AR and VR, and sharing it with the world, read on. 🧵.

0

3

21

Mikael Henaff

@HenaffMikael

2 years

Also feel free to reach out if you want to grab coffee and chat about RL, exploration, generalization, LLMs for decision making, or anything else :) #ICML2023.

0

2

18

Mikael Henaff

@HenaffMikael

5 years

Excited to share some recent work in imitation learning at #iclr2020, which uses an ensemble of policies to reduce covariate shift. Joint work with @xkianteb and Wen Sun. Paper: Talk:

0

5

17

Mikael Henaff

@HenaffMikael

10 months

Stoked about this new benchmark for long-horizon planning, intrinsic motivation, procedural generalization and memory.

Michael Matthews

@mitrma

1 year

I’m excited to announce Craftax, a new benchmark for open-ended RL!. ⚔️ Extends the popular Crafter benchmark with Nethack-like dungeons.⚡Implemented entirely in Jax, achieving speedups of over 100x.1/

0

2

17

Mikael Henaff

@HenaffMikael

8 months

Internship is in NYC, if interested please email me at: mikaelhenaff at meta dot com and apply here: looking forward to hearing from you!.

1

0

16

Mikael Henaff

@HenaffMikael

3 years

This is a very exciting dataset - stochastic policies/dynamics, large action space, partial observability, rich dynamics, *very* large scale while still enabling fast experiments. Can't wait to start playing with it and hope others do too!.

Eric Hambro

@erichammy

3 years

Delighted to present the NetHack Learning Dataset (NLD) at #NeurIPS2022 next week!. NLD is a new large-scale dataset for NetHack and MiniHack, aimed at supercharging research in offline RL, learning from observations, and imitation learning. 1/

0

4

15

Mikael Henaff

@HenaffMikael

1 year

Interview of @sharathraparthy discussing our recent work showing that transformers can in-context learn new sequential decision-making tasks in new environments. Check it out!.

TalkRL Podcast

@TalkRLPodcast

1 year

Episode 48: Sharath Chandra Raparthy.@sharathraparthy (AI Resident at @AIatMeta) on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more! .

1

14

Mikael Henaff

@HenaffMikael

1 year

Latest work where we present OpenEQA, a modern embodied Q&A benchmark which tests multiple capabilities such as spatial reasoning, object recognition and world knowledge on which SOTA VLMS like GPT4V/Claude/Gemini fail. A new challenge for embodied AI! To be presented @CVPR.

AI at Meta

@AIatMeta

1 year

Today we’re releasing OpenEQA — the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?”. More details ➡️

0

6

15

Mikael Henaff

@HenaffMikael

2 years

In Hawaii for #ICML2023, presenting two works Tuesday:. - A Study of Global and Episodic Bonuses in Contextual MDPs (poster at 2pm, oral at 6:10 pm). - Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories (poster at 11am). Hope to see you there :)

0

3

14

Mikael Henaff

@HenaffMikael

3 years

We are hiring a research intern for next year - if you would like to work on hierarchical RL, world models, modular networks and related topics with @shagunsodhani, myself and other researchers at FAIR please reach out! :).

Shagun Sodhani

@shagunsodhani

3 years

We are hiring a #research #intern at FAIR (@MetaAI) to work in areas related to #RL, #hierarchical RL, #modular #networks, and #world #models. Location: Montreal / New York / Remote. You can dm me your questions and resume!.

0

14

Mikael Henaff

@HenaffMikael

3 years

Presenting our E3B work on exploration in changing environments at #NeurIPS at 11 am NOLA time in Hall J #105. come by and say hi! with @robertarail @MinqiJiang @_rockt

0

3

12

Mikael Henaff

@HenaffMikael

2 years

@_rockt @HeinrichKuttler @_samvelyan @erichammy you might be interested, this method is able to make progress on the Oracle task without demos (although sometimes in unexpected ways ;)).

3

0

8

Mikael Henaff

@HenaffMikael

6 years

Nice article in @techreview about our paper on model-based RL with uncertainty regularization for #autonomousdriving.

MIT Technology Review

@techreview

6 years

Reinforcement learning makes mistakes as it learns. That's fine when playing a board game. It's, erm, not great in a life-or-death situation.

0

5

11

Mikael Henaff

@HenaffMikael

1 year

@EugeneVinitsky The difference between algorithms that explore efficiently vs. not is essentially polynomial vs. exponential sample complexity (itself a lower bound on compute complexity). Imo more compute can crack some harder poly problems but will eventually hit a wall with exponential ones:).

3

0

10

Mikael Henaff

@HenaffMikael

6 months

@goodfellow_ian Really sorry to hear this. I had a bad case of LC as well in 2020 and few understand how brutal it is. Are you sure POTS is the main culprit? Asking because I had that diagnosis too but it later turned out to be wrong. This ended up helping me:

3

1

9

Mikael Henaff

@HenaffMikael

2 years

@patrickmineault @ylecun End to end memory networks in 2015 ( by @tesatory were an important precursor in the sense that like the transformer (and unlike the NTM), they maintain the sequence structure and perform multiple layers of attention over it.

0

2

10

Mikael Henaff

@HenaffMikael

1 year

New work led by @sharathraparthy and jointly with @robertarail @erichammy @_robertkirk showing that one can in-context learn completely *new tasks* on *new environments* via large-scale pretraining and few shot examples. To be presented at upcoming @NeurIPSConf FMDM workshop!.

Sharath Raparthy

@sharathraparthy

1 year

🚨 🚨 !!New Paper Alert!! 🚨 🚨. How can we train agents that learn new tasks (with different states, actions, dynamics and reward functions) from only a few demonstrations and no weight updates?. In-context learning to the rescue!. In our new paper, we show that by training

0

2

9

Mikael Henaff

@HenaffMikael

10 months

Takes me back to my days as a starry-eyed master's student, when Pytorch's grandparent Lush was still used in @ylecun 's lab <3 . Lush was actually the first programming language I seriously learned (I'd been studying math until then). Such fond memories counting parentheses!.

Alfredo Canziani

@alfcnz

10 months

I wrote two blog posts about SN, Léon Bottou and @ylecun's 1988 Simulateur de Neurones. One is an English translation of the original paper, for which I've reproduced the figures. The other is a tutorial on how to run their code on Apple silicon.

1

0

8

Mikael Henaff

@HenaffMikael

1 year

@jsuarez5341 Procedural generation or settings where the environment changes across episodes. Exploration operates very differently in that setting and a lot of algorithms for static MDPs fail.

0

7

Mikael Henaff

@HenaffMikael

3 years

E3B sets a new SOTA on 16 challenging sparse-reward tasks from the MiniHack suite. In particular, it does so without requiring any feature engineering or task-specific prior knowledge. [6/N]

1

2

6

Mikael Henaff

@HenaffMikael

2 years

@akbirkhan @MetaAI definitely possible in NYC, would have to see about London. feel free to apply here and send your cv to mikaelhenaff@meta.com :).

0

7

Mikael Henaff

@HenaffMikael

2 years

It was also a pleasure working with @shagunsodhani @robertarail @yayitsamyzhang and Pascal Vincent on this project!.

0

5

Mikael Henaff

@HenaffMikael

8 months

I'll be presenting a poster about some of our recent work on LLM-guided exploration and intrinsic motivation at the NY academy of sciences this coming Friday: if you're in the tri-state area, it's a nice event to chat about ML in a relaxed setting.

0

1

7

Mikael Henaff

@HenaffMikael

2 years

Nice opportunity to work with some great researchers!.

Roberta Raileanu

@robertarail

2 years

Our group has multiple openings for internships at FAIR London (@MetaAI). I’m looking for someone to work on language models + decision making e.g. augmenting LMs with actions / tools / goals, interactive / open-ended learning for LMs, or RLHF. Apply at

0

7

Mikael Henaff

@HenaffMikael

2 years

@HaqueIshfaq Minihack is quite nice, there are lots of tasks and many of them are sparse reward, and it has the additional interesting twist of being procedurally generated. We have some code to train a variety of exploration algorithms here:

1

6

Mikael Henaff

@HenaffMikael

8 months

High performing, open source VLMs and smol LLMs now available. nature is healing🌱.

AI at Meta

@AIatMeta

8 months

📣 Introducing Llama 3.2: Lightweight models for edge devices, vision models and more!. What’s new?.• Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for @Arm, @MediaTek & @Qualcomm on day one. •

0

7

Mikael Henaff

@HenaffMikael

3 years

@_rockt @NetHack_LE 'nutritiously hard', sounds like a juicy problem and a hard nut to crack ;).

1

0

5

Mikael Henaff

@HenaffMikael

1 year

@TongzhouWang I think reorganizing information can be seen as adding new information encoded in the reorganization scheme. For example, if you are reorganizing N bits of information with program P of length K bits, you are effectively adding K bits of new information.

1

0

6

Mikael Henaff

@HenaffMikael

3 years

Exploration in standard MDPs is well studied, but what about contextual MDPs (CMDPs) where the environment changes each episode? This general framework captures scenarios such as procgen video games or embodied AI tasks where the agent must generalize across physical spaces.[2/N]

1

0

4

Mikael Henaff

@HenaffMikael

6 months

@_rockt @jsuarez5341 @PaglieriDavide @NetHack_LE And for Nethack, |A| ~= 100, H >= 10000. Things change if we have a smart exploration algorithm though, which is one of the reasons RL is interesting :).

0

4

Mikael Henaff

@HenaffMikael

3 years

@alfcnz When the turntable was invented, some people thought it was the end of music. Then people used it to make entirely new kinds of music (sampling, DJing etc). Human creativity always finds a way to express itself given the tools available :).

1

0

4

Mikael Henaff

@HenaffMikael

5 years

New paper accepted to #icml2020 - this takes steps towards bridging the theory-practice gap in RL by providing a provably sample-efficient algorithm for block MDPs which uses contrastive learning. Long version: . #ReinforcementLearning.

0

4

Mikael Henaff

@HenaffMikael

2 years

@HaqueIshfaq It has several of the minigrid envs ported but is a lot more challenging because count-based episodic bonuses do not work, we discuss some more here:

0

4

Mikael Henaff

@HenaffMikael

8 months

@FelixHill84 Sorry to hear about this Felix, but I'm glad things are starting to look up. I remember when we interned together in the early days which felt like a different world. I really admire both your scientific and human contributions to the field, wishing you well!.

1

0

5

Mikael Henaff

@HenaffMikael

6 months

@_rockt @jsuarez5341 @PaglieriDavide @NetHack_LE I think that "naive" tabula rasa RL cannot solve tasks like Nethack even with a universe sized computer. Consider a sparse reward task with |A|=10 actions and horizon H=100. Expected num samples with naive exploration is O(|A|^H) = O(10^100), more than # atoms in the universe :).

3

0

4

Mikael Henaff

@HenaffMikael

3 years

While exploration in CMDPs has recently started receiving attention, we show that existing methods critically rely on an episodic count-based bonus, and fail if this bonus is removed. This also means they fail in complex envs where each state is seen at most once. [3/N]

1

0

3

Mikael Henaff

@HenaffMikael

2 years

@SinghAyush2811 @MetaAI these are for students in a PhD program, but we sometimes have AI resident spots too which do not have this requirement. will advertise if so.

0

4

Mikael Henaff

@HenaffMikael

2 years

just checked out Movetodon and it's a very easy way to automatically follow all your twitter contacts on Mastodon. i was pleasantly surprised that lots of people are there already! hope to see you there

0

1

4

Mikael Henaff

@HenaffMikael

3 years

Big thanks to my co-authors @robertarail @MinqiJiang @_rockt! Try out our code at: [8/N, N=8].

1

0

3

Mikael Henaff

@HenaffMikael

1 year

@CupiaBart Nice work, it's great to see interest in NetHack! If you're in this space you might be interested in a couple other repos:. In particular, Motif makes some progress on the very challenging Oracle task and uses SF as the RL env.

0

2

Mikael Henaff

@HenaffMikael

3 years

We also evaluate E3B for reward-free exploration on Habitat, which provides photorealistic simulations of real indoor environments. Here, E3B outperforms existing methods by a wide margin. [7/N]

1

0

2

Mikael Henaff

@HenaffMikael

3 years

To address this limitation, we propose Exploration via Elliptical Episodic Bonuses (E3B). E3B uses an elliptical episodic bonus, which generalizes count-based episodic bonuses to continuous state spaces, paired with a feature extractor learned with an inverse dynamics model.[5/N]

1

0

2

Mikael Henaff

@HenaffMikael

2 years

My friend Kelsey (aka @arcanelibrary ) designed a new D&D system inspired by the earlier versions of the game - simple, fast and deadly. I playtested the game during development and can't recommend it enough :) it's now available on Kickstarter!

0

3

Mikael Henaff

@HenaffMikael

2 years

@_aidan_clark_ Near-Optimal Reinforcement Learning in Polynomial Time - UPenn CIS This is based on the idea of novelty bonuses, which has also been extended to deep RL settings (e.g. RND, ICM, pseudocounts, etc).

0

1

Mikael Henaff

@HenaffMikael

3 years

@NicoBohlinger Thanks! We didn't compare to NGU but others have found it not to work well on procgen envs: One conceptual difference is that the elliptical bonus automatically normalizes wrt scale but NGU's KNN-based one doesn't which means a few features could dominate.

1

0

2

Mikael Henaff

@HenaffMikael

4 years

@yayitsamyzhang Woo congrats Amy!! UT is a great place (I did my undergrad there).

0

2

Mikael Henaff

@HenaffMikael

2 years

Finally, we conduct a systematic comparison of global & episodic design choices across 16 MiniHack tasks. We find that combining the episodic E3B bonus with the global RND bonus sets a new SOTA on MiniHack. Multiplying is also consistently better than adding. 14/N

1

0

2

Mikael Henaff

@HenaffMikael

4 years

New paper at #NeurIPS2020 presenting PC-PG, a policy gradient algorithm that explores by growing a set of policies covering the set of possible states. Polynomial sample complexity in the linear case, and plays nice with modern deep RL methods.

0

2

Mikael Henaff

@HenaffMikael

6 months

We share our code - excited to see what people build with this! Many thanks to @qqyuzu @adityagrover_ @yayitsamyzhang @brandondamos for another fun collaboration.

0

1

2

Mikael Henaff

@HenaffMikael

6 months

created an account on the celestial network, hope to see you there!

0

2

Mikael Henaff

@HenaffMikael

1 year

@TongzhouWang Oh interesting, yes that sounds quite related! Yeah algorithmic complexity leads to cool thought experiments despite being not practical unless you have a universe sized computer ;p.

0

2

Mikael Henaff

@HenaffMikael

2 years

@UCL_DARK @MinqiJiang Big congrats Dr. @MinqiJiang !!! Very well deserved and it's been a pleasure collaborating during your time at FAIR. Looking forward to seeing what you come up with next :).

1

0

2

Mikael Henaff

@HenaffMikael

2 years

@ashkamath20 @ylecun @kchonyc @sainingxie @mengyer Congrats Dr. Kamath!!.

0

2

Mikael Henaff

@HenaffMikael

2 years

Contextual MDPs are MDPs where the environment changes each episode, and have been gathering increasing interest. For example, Procgen, NetHack/MiniHack, Minecraft/Crafter and embodied AI envs all fall within this category. How should we best explore in this setting? 2/N

1

0

2

Mikael Henaff

@HenaffMikael

2 years

Very nice work by @mklissar on learning long-horizon exploratory behaviors using Laplacian eigenfunctions.

Martin Klissarov

@MartinKlissarov

2 years

🎉I'm particularly excited to share this project I worked on under the guidance of @MarlosCMachado 🧙. We ask: what is the *right scaffold* for building temporal abstractions, from the ground up?. Website: It will be presented next week at #ICML2023 🏝️

0

2

Mikael Henaff

@HenaffMikael

2 years

Overall, this clarifies our understanding of how different exploration algorithms operate in CMDPs and opens up a number of exciting new directions. See paper for full details:.Thanks to my collaborators @MinqiJiang and @robertarail ! 15/N, N=15.

0

2

Mikael Henaff

@HenaffMikael

3 years

@cedcolas @robertarail @MinqiJiang @_rockt . for NGU's KNN-based bonus, if one of the dimensions has much larger scale than the others it can dominate the bonus due to euclidean distance being used.

0

2

Mikael Henaff

@HenaffMikael

4 years

@robertarail Congrats and welcome Roberta!.

0

2

Mikael Henaff

@HenaffMikael

2 years

If this value function changes significantly across episodes, the global bonus may get exhausted in areas that later give high reward. In the example below, once the agent has explored some area it will no longer visit it even though the value becomes is high later on. 9/N

1

0

1

Mikael Henaff

@HenaffMikael

6 months

@jsuarez5341 @_rockt @PaglieriDavide @NetHack_LE Basically, they bring complexity from O(|A|^H) to O(poly(env_complexity)). It's a huge improvement (exponential to polynomial), but may still not be enough if env_complexity is big. Adding priors through LLMs or otherwise can further reduce.

0

1

Mikael Henaff

@HenaffMikael

2 years

@healingfromlc Don't lose hope, I got it almost 3 years ago, was mostly non-functional for a year but it got better little by little & I am now doing MUCH better to the point where symptoms are mostly just an inconvenience. It will get better just v slowly and with lots of ups & downs.

0

1

Mikael Henaff

@HenaffMikael

2 years

@abreanac I quite liked this one, it covers the main ideas and appeals more to intuitions than rigorous proofs. The book by James Gleick is also great for an even more informal overview and historical perspective.

1

0

1

Mikael Henaff

@HenaffMikael

2 years

Conversely, we also provide singleton MDP counterexamples where episodic bonuses fail and global bonuses succeed. In the example below, the episodic bonus will not incentivize the agent to visit more than one corridor since it keeps being reset each episode. 6/N

1

0

1

Mikael Henaff

@HenaffMikael

2 years

However, because they aim to cover the entire feature space each episode, episodic bonuses can be inefficient when there is lots of shared structure across episodes. Covering the entire feature space may also be simply be impossible, as shown in our counterexample above. 11/N.

1

0

1

Mikael Henaff

@HenaffMikael

1 year

@TongzhouWang That's the premise of a nice short story called The Library of Babel, by Borges.

0

1

Mikael Henaff

@HenaffMikael

2 years

The high-level takeaway of our study is that global bonuses succeed in settings with lots of shared structure across different contexts/episodes, whereas episodic bonuses are better when little structure is shared. Combining the two improves robustness across regimes. 4/N.

1

0

1

Mikael Henaff

@HenaffMikael

6 years

Excited to share recent work to be presented at #NeurIPS2019 : explicit exploration/exploitation using dynamics models. - Polynomial sample complexity bound in idealized setting, independent of number of states.- Practical algorithm using neural networks.

0

1

Mikael Henaff

@HenaffMikael

3 years

An alternative idea could be to count handcrafted features extracted from states, but this relies heavily on prior knowledge. We show that while this can be effective in some cases, it is difficult to design a feature extractor which works well across many tasks. [4/N]

1

0

Mikael Henaff

@HenaffMikael

1 year

@TacoCohen Such good news, it's great to have you!.

0

1

Mikael Henaff

@HenaffMikael

2 years

@martingale_li @MetaAI feel free to apply here and send your cv to mikaelhenaff@meta.com!.

0

Mikael Henaff

@HenaffMikael

1 year

@petrenko_ai @CupiaBart It's great to see more interest in NetHack! We also used SF in a couple other repos which ran NetHack/MiniHack. Not sure if you remember but you answered several of my questions on the SF Discord which was very helpful :).

1

0

1

Mikael Henaff

@HenaffMikael

3 years

@EugeneVinitsky Congrats, and good to hear you'll be in NY!.

1

0

1

Mikael Henaff

@HenaffMikael

2 years

We also conduct pixel-based experiments on Habitat and Montezuma's Revenge, which suggest that the tradeoffs between global & episodic bonuses we identified previously apply more broadly. The combined bonus helps, but less than before - this remains an open area of research. 13/N

1

0

1

Mikael Henaff

@HenaffMikael

2 years

To shed light on the strengths & weaknesses of global & episodic bonuses, we first show that for certain CMDPs, as the number of unique contexts |C| increases global bonuses perform increasingly worse. On the other hand, episodic bonuses retain decent performance. 5/N

2

0

1

Mikael Henaff

@HenaffMikael

6 months

@goodfellow_ian Also fwiw my LC got better very slowly over the course of 3-4 years to the point it doesn’t affect me much any more. I think mindfulness and a healthy diet helped, but also just time. So don’t lose hope!.

1

0

1

Mikael Henaff

@HenaffMikael

6 months

We instantiate different algorithmic choices inside this framework, defining intrinsic rewards in terms of retrieval, classification and ranking. It can also be used to develop and test new intrinsic motivation algorithms.

1

0

1

Mikael Henaff

@HenaffMikael

2 years

@ikostrikov @OpenAI Awesome, congrats Ilya!!.

0

1

Mikael Henaff

@HenaffMikael

2 years

Can we design a bonus which is robust to differing amounts of shared structure? We investigate a simple multiplicative combination of global and episodic bonuses, which performs robustly across envs with differing degrees of shared structure. 12/N

1

0

1

Mikael Henaff

@HenaffMikael

9 months

@EduTrending @AIatMeta @NetHack_LE @InnerverseAI @firecrawl_dev @neo4j Hi Lindsay! The NLE is still being used for research, yes - stay tuned :) concerning the repo itself, I believe it is currently being maintained by @HeinrichKuttler here:

0

1

Mikael Henaff

@HenaffMikael

1 year

@TongzhouWang For example: a very short program can generate every possible book of N words, including works of genius and undiscovered scientific truths. But now, finding such needles in the haystack requires inputting information into the system. .

2

0

1

Mikael Henaff

@HenaffMikael

5 years

@WeLoveDogsHNL @RosettaAtHome @foldingathome That's awesome, 15 years is a lot of number crunching!.

0

1

Mikael Henaff

@HenaffMikael

3 years

@alfcnz achievement unlocked.

0

1

Mikael Henaff

@HenaffMikael

6 months

@jsuarez5341 @_rockt @PaglieriDavide @NetHack_LE Novelty-based exploration algorithms are able to solve problems of the type I described with fairly minimal assumptions and sample complexity ~ polynomial in some measure of env complexity, only by env interaction. These include count-based bonuses and more recent stuff like RND.

2

0

1

Mikael Henaff

@HenaffMikael

3 years

@cedcolas @robertarail @MinqiJiang @_rockt Thanks! we didn't compare to NGU but others have reported it not to work well on procgen envs (. A conceptual advantage of the elliptical bonus is that it automatically adjusts the scale over each dimension. .

1

0

1

Mikael Henaff

@HenaffMikael

3 years

Hi! I'm on Mastodon: @mikaelhenaff @sigmoid.social. See you there :).

0

1

Mikael Henaff

@HenaffMikael

6 months

@proceduralia Congrats Dr., very well deserved!.

0

1

Mikael Henaff

@HenaffMikael

2 years

Most recent exploration algorithms use some combination of global bonuses (measuring novelty wrt all the agent's experience) and episodic bonuses (measuring novelty wrt the current episode only). However, the use of these has been ad-hoc and poorly understood. 3/N.

1

0

1