Pulkit Agrawal Profile
Pulkit Agrawal

@pulkitology

Followers
12K
Following
446
Media
110
Statuses
358

Faculty @ MIT

Cambridge, MA
Joined June 2009
Don't wanna be here? Send us removal request.
@pulkitology
Pulkit Agrawal
3 years
Presenting Visual Dexterity: Object re-orientation in its full generality! Single camera. Novel objects. Any orientation. Downward-facing hand that fights gravity. Real-time dynamic control. Open source setup. Learn more: Led by @taochenshh #robots #rl
5
58
313
@pulkitology
Pulkit Agrawal
4 days
DexWrist is built for compliance, force control and is easy to simulate. It uses quasi direct drive actuators, is backdrivable and has high control bandwidth. Work led by @martinpeticco @JohnMarangola and Gabriella. @MIT_CSAIL @csail_alliances @MIT.
1
0
1
@pulkitology
Pulkit Agrawal
4 days
More intuitive data leads to better performance!
Tweet media one
1
0
2
@pulkitology
Pulkit Agrawal
4 days
DexWrist makes teleop more intuitive, shorter trajectories and thus allows faster and cheaper data collection.
Tweet media one
1
0
1
@pulkitology
Pulkit Agrawal
4 days
What if robots had wrists? . Introducing DexWrist: a drop-in addition to your robot arm to operate in messy environments, be more dynamic, more teleoperation intuitive and overall better performance. Want to learn more or get a DexWrist?
1
6
48
@pulkitology
Pulkit Agrawal
6 days
What if robots had wrists? 👇.
@martinpeticco
Martin Peticco
6 days
What’s keeping robot arms from working like human arms?. They're big, slow, have the wrong joints, and can't conform to their environment. DexWrist solves all of these issues and simplifies learning constrained, dynamic manipulation👉 
2
0
42
@pulkitology
Pulkit Agrawal
6 days
RT @martinpeticco: What’s keeping robot arms from working like human arms?. They're big, slow, have the wrong joints, and can't conform to….
0
51
0
@pulkitology
Pulkit Agrawal
22 days
What if an LLM can decide what data to use, potentially generate its own data and decide how to update itself 👇.
@jyo_pari
Jyo Pari
22 days
What if an LLM could update its own weights?. Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.
Tweet media one
2
3
18
@pulkitology
Pulkit Agrawal
3 months
Llama 4 (@Meta) results are consistent with what we hypothesized will unleash the next generation of AI reasoning. A new paradigm for pre-training is around the corner
@jyo_pari
Jyo Pari
3 months
Llama 4 (@Meta) shows too much SFT limits RL exploration — something we also found in our recent work! A new and superior pretraining paradigm is around the corner to unleash a new era of reasoning. Check out our paper: Thread:
Tweet media one
1
3
22
@pulkitology
Pulkit Agrawal
4 months
Initial tests, even on pre-trained language models, suggest that directly doing reward-based fine-tuning and skipping supervised fine-tuning works better! . Joint work with @seungwookh, @jyo_pari and @gershbrain.
2
0
18
@pulkitology
Pulkit Agrawal
4 months
We tested the reasoning of current large models on simple tasks like sorting and printing numbers, but in new languages not common on the internet -- these models mostly fail!
Tweet media one
1
0
21
@pulkitology
Pulkit Agrawal
4 months
Overturning the next-token prediction is required for achieving general reasoning! We predict that RPT (Reward Pre-Training) will overtake GPT in the future -- similar to how AlphaZero overtook AlphaGo. Learn more: 🚨Our whitepaper, “General Reasoning
Tweet media one
5
60
429
@pulkitology
Pulkit Agrawal
4 months
ORSO outperforms other methods of choosing reward functions and vastly outperforms Eureka which uses Naive reward selection mechanisms.
Tweet media one
0
0
5
@pulkitology
Pulkit Agrawal
4 months
With 1 GPU, do what 8 GPUs can do!
Tweet media one
1
0
5
@pulkitology
Pulkit Agrawal
4 months
Casting reward selection as a model selection leads up to 8x faster learning and 50% better performance! (.⚡ Provable regret guarantees. 🌟 Easy to implement (. ⚔️ 1 GPU can do the work of up to 8 GPUs! . Presenting ORSO:
Tweet media one
3
23
144
@pulkitology
Pulkit Agrawal
4 months
Agents performing "curious" exploration by setting random abstract goals! We present a simple but effective method for deep exploration of reinforcement learning (RL) agents that we call random latent exploration (RLE). Typical RL gents explore by:.⚔️ Noise-based exploration,
Tweet media one
5
29
186
@pulkitology
Pulkit Agrawal
5 months
You aren't doing robotics if you are not breaking some robots!
12
25
370
@pulkitology
Pulkit Agrawal
5 months
Pushing the limits requires brave souls!
0
0
9
@pulkitology
Pulkit Agrawal
5 months
Presenting Unsupervised Actuator Nets (UANs) that push the limits of agile whole-body control without the need for reward shaping! . ⚡️ UANs reduce the sim2real gap in robot's motors removing the need for reward design to bridge the sim2real gap. ⚡️ A pre-trained whole-body
2
19
152
@pulkitology
Pulkit Agrawal
5 months
Auditing and exposing the fragility of language-conditioned robot models with Embodied Red Teaming (ERT)! . 🤯 Simple re-phrasing of task instructions, e.g., from "Please bring me a can of coke" to "Give me a coke," is the difference between the robot succeeding or failing.
Tweet media one
1
12
78