Harshit Sikchi (will be at NeurIPS 25)
@harshit_sikchi
Followers
2K
Following
2K
Media
46
Statuses
446
Research at @OpenAI; Reinforcement Learning; PhD from UT Austin. Previously FAIR Paris @AIatMeta, @CMU_Robotics @NVIDIAAI @UberATG.
San Francisco, CA
Joined July 2018
Check out GPT-5. Starting around two months ago now, was fortunate to get to contribute to something so fun!
0
0
14
This work builds upon the progress made in unsupervised RL in recent years: https://t.co/tLO6kVerQ8
https://t.co/86K368o6E3
https://t.co/FYNrp0WZ9J
https://t.co/dYxL1j0iCa
arxiv.org
Unsupervised zero-shot reinforcement learning (RL) has emerged as a powerful paradigm for pretraining behavioral foundation models (BFMs), enabling agents to solve a wide range of downstream tasks...
0
0
2
Check out the thread below for more details: https://t.co/7evCk5WBHr This is a collaboration with @agsidd10,@JajooPranaya,@parajuli_samyak, Caleb Chuck,@maxbrudolph,@PeterStone_TX,@yayitsamyzhang,@scottniekum.
🤖 Introducing RL Zero 🤖: a new approach to transform language into behavior zero-shot for embodied agents without labeled datasets! RL Zero enables prompt-to-policy generation, and we believe this unlocks new capabilities in scaling up language-conditioned RL, providing an
1
0
1
Come see us at our poster session in San Diego: https://t.co/KGmDWSfjBm Fri 5 Dec 4:30 p.m. PST — 7:30 p.m. PST Want to quickly learn how it works? Check out the short talk here: Paper : https://t.co/DHGLiYuRt8 Talk: https://t.co/3suqt6tt15
1
0
3
Announcing RLZero for Generalist Agents at #NeurIPS2025. To our knowledge, the first to enable all of: 💬 Language → behavior (zero-shot) 🎥 Video → behavior (zero-shot, cross-embodiment) 🧠 One Behavioral Foundation Model for many tasks From instructions & demos to actions—no
2
4
37
One of the many things we reinvented and revived from RL; this one’s on policy distillation for LLM land
Hot take: DAgger (Ross 2011) should be the first paper you read to get into RL, instead of Sutton's book. Maybe also read scheduled sampling (Bengio 2015). And before RL, study supervised learning thoroughly.
2
2
21
I am on wait and watch mode on how good this is
0
0
12
Update: Mehtaab and I pushed further on this. Using thousands of GPT5 queries, we found solutions to 10 Erdős problems that were listed as open: 223, 339, 494, 515, 621, 822, 883 (part 2/2), 903, 1043, 1079. Additionally for 11 other problems, GPT5 found significant partial
gpt5-pro is superhuman at literature search: it just solved Erdos Problem #339 (listed as open in the official database https://t.co/3vCCCR0cXs) by realizing that it had actually been solved 20 years ago h/t @MarkSellke for pointing this out to me!
42
92
932
🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)
2
39
133
Even with cool ideas, researchers often overlook how important implementation details can be. Getting these things right can be key to scaling up deep RL
(1/n) With over 1,300 citations, MBPO is often cited as proof that model based RL beats model free methods. In https://t.co/xq3WXslh67 we showed it often completely fails in DeepMind Control. In our new work, Fixing That Free Lunch (FTFL), we explain why and make it succeed.
0
0
20
SF does really summer in October
0
1
17
We're finally out of stealth: https://t.co/mRieBSLG0j We're a research / engineering team working together in industries like health and logistics to ship ML tools that drastically improve productivity. If you're interested in ML and RL work that matters, take a look 😀
percepta.ai
Transforming critical institutions using applied AI. Let's harness the frontier.
15
14
99
Yet more evidence that a pretty major shift is happening, this time by Scott Aaronson https://t.co/R1kPhCWhwD
125
454
4K
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. https://t.co/uKPPDldVNS
61
190
1K
RLZero will be presented at @NeurIPSConf 2025 . Learn more about the work in the thread below:
🤖 Introducing RL Zero 🤖: a new approach to transform language into behavior zero-shot for embodied agents without labeled datasets! RL Zero enables prompt-to-policy generation, and we believe this unlocks new capabilities in scaling up language-conditioned RL, providing an
4
7
55
A good way to test generalizable capability in current world of potentially contaminated datasets are competitions and we are making steady progress!
1/n I’m really excited to share that our @OpenAI reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have
0
0
9
[1/4] 🚀 We’re excited to announce the v1 release of JaxAHT – a new library for Ad Hoc Teamwork (AHT) research, built with JAX for speed & scalability! Check it out 👉 https://t.co/Vmpbm72YwS
#AI #MARL #ReinforcementLearning #JAX #AdHocTeamwork
1
7
36
LLMs lose diversity after RL post-training, and this hurts test-time scaling & creativity. Why does this collapse happen, and how can we fix it? Our new work introduces: 🔍 RL as Sampling (analysis) 🗺️ Outcome-based Exploration (intervention) [1/n]
9
88
467
#K2Think (🏔️💭) is now live. We're proud of this model that punches well above its weights, developed primarily for mathematical reasoning but has shown itself to be quite versatile. As a fully deployed reasoning system at https://t.co/3QVlEE9MfQ you can test it for yourself!
k2think.ai
K2 Think - Advanced Reasoning Model
Introducing K2 Think - a breakthrough in advanced AI reasoning. Developed by MBZUAI’s Institute of Foundation Models and @G42ai, K2 Think delivers frontier reasoning performance at a fraction of the size of today’s largest systems. Smaller. Smarter. Open to the world.
13
20
117