Jacky Kwok @jackyk02 X Profile

Jacky Kwok

@jackyk02

Followers

123

Following

46

Media

7

Statuses

16

Stanford CS PhD | Berkeley EECS

https://t.co/LbAXbRORnl

Palo Alto, CA

Joined June 2025

Don't wanna be here? Send us removal request.

Jacky Kwok

@jackyk02

5 months

✨ Test-Time Scaling for Robotics ✨ Excited to release 🤖 RoboMonkey, which characterizes test-time scaling laws for Vision-Language-Action (VLA) models and introduces a framework that significantly improves the generalization and robustness of VLAs! 🧵(1 / N) 🌐 Website:

2

17

72

Jon Saad-Falcon

@JonSaadFalcon

10 days

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):

46

135

418

Marco Pavone

@drmapavone

19 days

Excited to unveil @nvidia's latest work on #Reasoning Vision–Language–Action (#VLA) models — Alpamayo-R1! Alpamayo-R1 is a new #reasoning VLA architecture featuring a diffusion-based action expert built on top of the #Cosmos-#Reason backbone. It represents one of the core

nvidianews.nvidia.com

NVIDIA today announced it is partnering with Uber to scale the world’s largest level 4-ready mobility network, using the company’s next-generation robotaxi and autonomous delivery fleets, the new...

10

39

234

Jacky Kwok

@jackyk02

4 months

Thrilled to share that 🤖🐒 RoboMonkey is accepted to #CoRL2025 !! See you in Seoul 🇰🇷

Jacky Kwok

@jackyk02

5 months

✨ Test-Time Scaling for Robotics ✨ Excited to release 🤖 RoboMonkey, which characterizes test-time scaling laws for Vision-Language-Action (VLA) models and introduces a framework that significantly improves the generalization and robustness of VLAs! 🧵(1 / N) 🌐 Website:

0

1

21

Azalia Mirhoseini

@Azaliamirh

4 months

Happy to share RoboMonkey, a framework for synthetic data generation + scaling test time compute for VLAs: Turns out generation (via repeated sampling) and verification (via training a verifier on synthetic data) works well for robotics too! Training the verifier: we sample N

5

32

164

Jacky Kwok

@jackyk02

5 months

This work was an awesome collaboration between Stanford, UC Berkeley, and NVIDIA. It was made possible by an incredible team: @agiachris @RohanSinhaSU @MatthewFoutter @depetrol1 and amazing advisors: @drmapavone @Azaliamirh @istoica05

0

7

Jacky Kwok

@jackyk02

5 months

📋 Takeaways Rather than framing robot control as a generation problem, we suggest that viewing it through the lens of sampling and verification—generating diverse action candidates and verifying them—can be an effective path towards general-purpose robotics foundation models.

1

0

6

Jacky Kwok

@jackyk02

5 months

🧵(9 / N) To enable practical deployment for test-time scaling, we implemented a VLA serving engine on top of SGLang to speed up 🚀 repeated sampling of initial action candidates and employ Gaussian perturbation to efficiently construct an action proposal distribution.

1

0

5

Jacky Kwok

@jackyk02

5 months

🧵(8 / N) Scaling the synthetic dataset size (number of action comparisons) 📈 consistently improves the performance of the RoboMonkey verifier, leading to higher closed-loop success rates on SIMPLER.

1

0

6

Jacky Kwok

@jackyk02

5 months

🧵(7 / N) We find that RoboMonkey effectively mitigates issues of imprecise grasping, task progression failures, and collisions at deployment. Detailed task breakdowns and failure analysis are provided on our project website: https://t.co/xFKjDxRcRD.

1

2

9

Jacky Kwok

@jackyk02

5 months

🧵(6 / N) Eval: We show that pairing existing VLAs with RoboMonkey yields significant performance gains 🦾 achieving a 25% absolute improvement on real-world out-of-distribution tasks, 9% on in-distribution SIMPLER environments, and 7% on LIBERO-Long benchmark.

1

0

6

Jacky Kwok

@jackyk02

5 months

🧵(5 / N) Scaling: At deployment, we sample a small batch of actions from a policy. We use Gaussian perturbation and majority voting to efficiently generate more action candidates based on the initial samples. Finally, the VLM-based verifier is used to select the optimal action.

1

0

5

Jacky Kwok

@jackyk02

5 months

🧵(4 / N) Training: Given a robotics dataset, we sample N actions per state from a policy. We construct synthetic action preferences based on the RMSE between each sampled action and the ground-truth action. This dataset is then used to fine-tune a VLM-based action verifier.

1

0

7

Jacky Kwok

@jackyk02

5 months

🧵(3 / N) Core Questions: - 1️⃣ Can we capitalize on these scaling laws with a learned action verifier to improve policy robustness? - 2️⃣ Can we scale synthetic data to improve verification and downstream tasks? - 3️⃣ How do we enable practical deployment for test-time scaling?

1

0

6

Jacky Kwok

@jackyk02

5 months

🧵(2 / N) Test-time scaling law for VLAs: We observe that action error consistently decreases 📉 as we scale the number of generated actions. Repeatedly sampling actions from robot policies, applying Gaussian perturbation to a few sampled actions, and even random sampling of

1

0

7