
Deepak Pathak
@pathak2206
Followers
22K
Following
1K
Media
157
Statuses
624
Co-Founder & CEO at @SkildAI, Faculty at @CarnegieMellon. PhD @UCBerkeley. I study topics in AI (machine learning, robotics & computer vision).
Pittsburgh, PA
Joined May 2013
RT @_tonytao_: Want to add diverse, high-quality data to your robot policy?. Happy to share that the DexWild Dataset is now fully public, h….
0
6
0
RT @TheHumanoidHub: Got to visit the Robotics Institute at CMU today. The institute has a long legacy of pioneering research and pushing t….
0
22
0
A great example of scientific discourse at its best—thoughtful, constructive, and conclusive. We now have more rigorous evidence that confidence maximization improves reasoning. 👇.
1/ Maximizing confidence indeed improves reasoning. We worked with @ShashwatGoel7, @nikhilchandak29 @AmyPrb for the past 3 weeks (over a zoom call and many emails!) and revised our evaluations to align with their suggested prompts/parsers/sampling params. This includes changing
1
1
20
RT @ShashwatGoel7: Glad we could together improve the scientific discourse around reasoning. Was great to see the authors reach out and inc….
0
4
0
Congratulations to the team. great start at RSS!!. We have open-sourced DexWild -- makes it easy to build and scale robot learning with hands:
Thrilled to have received Best Paper Award at the EgoAct Workshop at RSS 2025! 🏆. We’ll also be giving a talk at the Imitation Learning Session I tomorrow, 5:30–6:30pm. Come to learn about DexWild!. Work co-led by @mohansrirama, with @JasonJZLiu, @kenny__shaw, and @pathak2206.
1
4
66
RT @JasonJZLiu: Presenting FACTR today at #RSS2025 in the Imitation Learning I session at 5:30pm (June 22). Come by if you're interested in….
0
12
0
Tired of tuning PPO or blaming it on reward, task design, etc.? Introducing EPO -- our second (and hopefully final :) attempt at fixing PPO at scale!. Contrary to intuition, as the batch size or data increases, PPO saturates due to a lack of diversity in sampling. We proposed a.
(1/n) Since its publication in 2017, PPO has essentially become synonymous with RL. Today, we are excited to provide you with a better alternative - EPO.
2
8
101
RT @stevenl: I’m thrilled to announce the launch of my $40M pre-seed and seed-stage fund, @SevenStars_VC, where I’ll be focused on partneri….
0
41
0
Also, check out wonderful concurrent work (came out yesterday) from our friends at Berkeley @xuandongzhao @dawnsongtweets and team -- similar ideas but experiments are complementary, nuanced findings in both:.
🚀 Excited to share the most inspiring work I’ve been part of this year:. "Learning to Reason without External Rewards". TL;DR: We show that LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence. 1/n
1
0
7
Maximizing Confidence Alone Improves Reasoning. Feels like the start of the "curiosity-driven learning" era for LLMs. I have spent most of my career towards building agents that can self-improve without any external rewards (e.g., curiosity work during Phd and then at CMU).
Excited to share our work: Maximizing Confidence Alone Improves Reasoning. Humans rely on confidence to learn when answer keys aren’t available (e.g taking an exam). Surprisingly, LLMs can also learn w/o ground-truth answers, simply by reinforcing high-confidence answers via RL!
4
12
77
RT @mihirp98: Excited to share our work: Maximizing Confidence Alone Improves Reasoning. Humans rely on confidence to learn when answer key….
0
37
0
Congratulations, Dr. Murtaza! 🥳.
Incredibly excited to share that I am now officially Dr. Murtaza Dalal! Last weekend marked the official end of an incredible journey across the last 5 years, including doing the first year of my PhD remote, moving to the other side of the country, becoming an independent
1
1
45
RT @khoomeik: someone should probably retry all those late 2010s deep RL ideas to see if they work on LLMs
0
149
0
RT @mohansrirama: Maybe real-world robot generalization doesn’t need massive teleop datasets? 🤔. In DexWild, we show that human demos 🙌 + a….
0
12
0
Introducing DexWild -- a scalable approach to diverse "in the wild" data collection for dexterous robotic hands! This data can be used to co-train policy for any downstream robotic hands on any body form factor (humanoids, AMR with arms, etc). 🚀🤖.
Training robots for the open world needs diverse data. But collecting robot demos in the wild is hard!. Presenting DexWild.🙌🏕️ Human data collection system that works in diverse environments, without robots.💪🦾 Human + Robot Cotraining pipeline that unlocks generalization. 🧵👇
3
10
69
RT @kenny__shaw: Exiciting to see (at 5:55) Nvidia adopting LEAP Hand in their sim2real efforts!. Build your own at .
0
8
0
LEAP Hand controlled by DOGlove. extremely low-cost dexterity!! Very cool.
What if anyone could build a high-quality haptic glove for robot control… in just a weekend?. [github & arXiv ⬇️]. This team did it. DOGlove is a fully open-source glove that brings precision, force feedback, and dexterity to robot teleoperation;. All for under $600. ✅ Tracks
1
5
54
RT @alexlioralexli: Excited to be presenting at #ICLR2025 at 10am today on how generative classifiers are much more robust to distribution….
0
6
0
Very excited about this direction -- a unified discrete diffusion model for joint text & image generation. Unlike popular autoregressive multimodal approaches, unified diffusion framework unlocks faster inference, better control via guidance, flexible compute-quality tradeoff,.
1/ Happy to share UniDisc - Unified Multimodal Discrete Diffusion – We train a 1.5 billion parameter transformer model from scratch on 250 million image/caption pairs using a **discrete diffusion objective**. Our model has all the benefits of diffusion models but now in
3
18
169