Pulkit Agrawal @pulkitology X Profile

Pulkit Agrawal

@pulkitology

Followers

12K

Following

446

Media

110

Statuses

358

Faculty @ MIT

Cambridge, MA

Joined June 2009

Don't wanna be here? Send us removal request.

Pulkit Agrawal

@pulkitology

3 years

Presenting Visual Dexterity: Object re-orientation in its full generality! Single camera. Novel objects. Any orientation. Downward-facing hand that fights gravity. Real-time dynamic control. Open source setup. Learn more: Led by @taochenshh #robots #rl

5

58

313

Pulkit Agrawal

@pulkitology

4 days

DexWrist is built for compliance, force control and is easy to simulate. It uses quasi direct drive actuators, is backdrivable and has high control bandwidth. Work led by @martinpeticco @JohnMarangola and Gabriella. @MIT_CSAIL @csail_alliances @MIT.

1

0

1

Pulkit Agrawal

@pulkitology

4 days

More intuitive data leads to better performance!

1

0

2

Pulkit Agrawal

@pulkitology

4 days

DexWrist makes teleop more intuitive, shorter trajectories and thus allows faster and cheaper data collection.

1

0

1

Pulkit Agrawal

@pulkitology

4 days

What if robots had wrists? . Introducing DexWrist: a drop-in addition to your robot arm to operate in messy environments, be more dynamic, more teleoperation intuitive and overall better performance. Want to learn more or get a DexWrist?

1

6

48

Pulkit Agrawal

@pulkitology

6 days

What if robots had wrists? 👇.

Martin Peticco

@martinpeticco

6 days

What’s keeping robot arms from working like human arms?. They're big, slow, have the wrong joints, and can't conform to their environment. DexWrist solves all of these issues and simplifies learning constrained, dynamic manipulation👉

2

0

42

Pulkit Agrawal

@pulkitology

6 days

RT @martinpeticco: What’s keeping robot arms from working like human arms?. They're big, slow, have the wrong joints, and can't conform to….

0

51

0

Pulkit Agrawal

@pulkitology

22 days

What if an LLM can decide what data to use, potentially generate its own data and decide how to update itself 👇.

Jyo Pari

@jyo_pari

22 days

What if an LLM could update its own weights?. Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

2

3

18

Pulkit Agrawal

@pulkitology

3 months

Llama 4 (@Meta) results are consistent with what we hypothesized will unleash the next generation of AI reasoning. A new paradigm for pre-training is around the corner

Jyo Pari

@jyo_pari

3 months

Llama 4 (@Meta) shows too much SFT limits RL exploration — something we also found in our recent work! A new and superior pretraining paradigm is around the corner to unleash a new era of reasoning. Check out our paper: Thread:

1

3

22

Pulkit Agrawal

@pulkitology

4 months

Initial tests, even on pre-trained language models, suggest that directly doing reward-based fine-tuning and skipping supervised fine-tuning works better! . Joint work with @seungwookh, @jyo_pari and @gershbrain.

2

0

18

Pulkit Agrawal

@pulkitology

4 months

We tested the reasoning of current large models on simple tasks like sorting and printing numbers, but in new languages not common on the internet -- these models mostly fail!

1

0

21

Pulkit Agrawal

@pulkitology

4 months

Overturning the next-token prediction is required for achieving general reasoning! We predict that RPT (Reward Pre-Training) will overtake GPT in the future -- similar to how AlphaZero overtook AlphaGo. Learn more: 🚨Our whitepaper, “General Reasoning

5

60

429

Pulkit Agrawal

@pulkitology

4 months

ORSO outperforms other methods of choosing reward functions and vastly outperforms Eureka which uses Naive reward selection mechanisms.

0

5

Pulkit Agrawal

@pulkitology

4 months

With 1 GPU, do what 8 GPUs can do!

1

0

5

Pulkit Agrawal

@pulkitology

4 months

Casting reward selection as a model selection leads up to 8x faster learning and 50% better performance! (.⚡ Provable regret guarantees. 🌟 Easy to implement (. ⚔️ 1 GPU can do the work of up to 8 GPUs! . Presenting ORSO:

3

23

144

Pulkit Agrawal

@pulkitology

4 months

Agents performing "curious" exploration by setting random abstract goals! We present a simple but effective method for deep exploration of reinforcement learning (RL) agents that we call random latent exploration (RLE). Typical RL gents explore by:.⚔️ Noise-based exploration,

5

29

186

Pulkit Agrawal

@pulkitology

5 months

You aren't doing robotics if you are not breaking some robots!

12

25

370

Pulkit Agrawal

@pulkitology

5 months

Pushing the limits requires brave souls!

0

9

Pulkit Agrawal

@pulkitology

5 months

Presenting Unsupervised Actuator Nets (UANs) that push the limits of agile whole-body control without the need for reward shaping! . ⚡️ UANs reduce the sim2real gap in robot's motors removing the need for reward design to bridge the sim2real gap. ⚡️ A pre-trained whole-body

2

19

152

Pulkit Agrawal

@pulkitology

5 months

Auditing and exposing the fragility of language-conditioned robot models with Embodied Red Teaming (ERT)! . 🤯 Simple re-phrasing of task instructions, e.g., from "Please bring me a can of coke" to "Give me a coke," is the difference between the robot succeeding or failing.

1

12

78