Jacky Kwok Profile
Jacky Kwok

@jackyk02

Followers
123
Following
46
Media
7
Statuses
16

Stanford CS PhD | Berkeley EECS

Palo Alto, CA
Joined June 2025
Don't wanna be here? Send us removal request.
@jackyk02
Jacky Kwok
5 months
✨ Test-Time Scaling for Robotics ✨ Excited to release 🤖 RoboMonkey, which characterizes test-time scaling laws for Vision-Language-Action (VLA) models and introduces a framework that significantly improves the generalization and robustness of VLAs! 🧵(1 / N) 🌐 Website:
2
17
72
@JonSaadFalcon
Jon Saad-Falcon
10 days
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):
46
135
418
@drmapavone
Marco Pavone
19 days
Excited to unveil @nvidia's latest work on #Reasoning Vision–Language–Action (#VLA) models — Alpamayo-R1! Alpamayo-R1 is a new #reasoning VLA architecture featuring a diffusion-based action expert built on top of the #Cosmos-#Reason backbone. It represents one of the core
Tweet card summary image
nvidianews.nvidia.com
NVIDIA today announced it is partnering with Uber to scale the world’s largest level 4-ready mobility network, using the company’s next-generation robotaxi and autonomous delivery fleets, the new...
10
39
234
@jackyk02
Jacky Kwok
4 months
Thrilled to share that 🤖🐒 RoboMonkey is accepted to #CoRL2025 !! See you in Seoul 🇰🇷
@jackyk02
Jacky Kwok
5 months
✨ Test-Time Scaling for Robotics ✨ Excited to release 🤖 RoboMonkey, which characterizes test-time scaling laws for Vision-Language-Action (VLA) models and introduces a framework that significantly improves the generalization and robustness of VLAs! 🧵(1 / N) 🌐 Website:
0
1
21
@Azaliamirh
Azalia Mirhoseini
4 months
Happy to share RoboMonkey, a framework for synthetic data generation + scaling test time compute for VLAs: Turns out generation (via repeated sampling) and verification (via training a verifier on synthetic data) works well for robotics too! Training the verifier: we sample N
5
32
164
@jackyk02
Jacky Kwok
5 months
This work was an awesome collaboration between Stanford, UC Berkeley, and NVIDIA. It was made possible by an incredible team: @agiachris @RohanSinhaSU @MatthewFoutter @depetrol1 and amazing advisors: @drmapavone @Azaliamirh @istoica05
0
0
7
@jackyk02
Jacky Kwok
5 months
📋 Takeaways Rather than framing robot control as a generation problem, we suggest that viewing it through the lens of sampling and verification—generating diverse action candidates and verifying them—can be an effective path towards general-purpose robotics foundation models.
1
0
6
@jackyk02
Jacky Kwok
5 months
🧵(9 / N) To enable practical deployment for test-time scaling, we implemented a VLA serving engine on top of SGLang to speed up 🚀 repeated sampling of initial action candidates and employ Gaussian perturbation to efficiently construct an action proposal distribution.
1
0
5
@jackyk02
Jacky Kwok
5 months
🧵(8 / N) Scaling the synthetic dataset size (number of action comparisons) 📈 consistently improves the performance of the RoboMonkey verifier, leading to higher closed-loop success rates on SIMPLER.
1
0
6
@jackyk02
Jacky Kwok
5 months
🧵(7 / N) We find that RoboMonkey effectively mitigates issues of imprecise grasping, task progression failures, and collisions at deployment. Detailed task breakdowns and failure analysis are provided on our project website: https://t.co/xFKjDxRcRD.
1
2
9
@jackyk02
Jacky Kwok
5 months
🧵(6 / N) Eval: We show that pairing existing VLAs with RoboMonkey yields significant performance gains 🦾 achieving a 25% absolute improvement on real-world out-of-distribution tasks, 9% on in-distribution SIMPLER environments, and 7% on LIBERO-Long benchmark.
1
0
6
@jackyk02
Jacky Kwok
5 months
🧵(5 / N) Scaling: At deployment, we sample a small batch of actions from a policy. We use Gaussian perturbation and majority voting to efficiently generate more action candidates based on the initial samples. Finally, the VLM-based verifier is used to select the optimal action.
1
0
5
@jackyk02
Jacky Kwok
5 months
🧵(4 / N) Training: Given a robotics dataset, we sample N actions per state from a policy. We construct synthetic action preferences based on the RMSE between each sampled action and the ground-truth action. This dataset is then used to fine-tune a VLM-based action verifier.
1
0
7
@jackyk02
Jacky Kwok
5 months
🧵(3 / N) Core Questions: - 1️⃣ Can we capitalize on these scaling laws with a learned action verifier to improve policy robustness? - 2️⃣ Can we scale synthetic data to improve verification and downstream tasks? - 3️⃣ How do we enable practical deployment for test-time scaling?
1
0
6
@jackyk02
Jacky Kwok
5 months
🧵(2 / N) Test-time scaling law for VLAs: We observe that action error consistently decreases 📉 as we scale the number of generated actions. Repeatedly sampling actions from robot policies, applying Gaussian perturbation to a few sampled actions, and even random sampling of
1
0
7