
Haruki Nishimura
@imp_aa
Followers
654
Following
1K
Media
38
Statuses
480
Learning and planning for safe, embodied autonomous systems under uncertainty. Senior Research Scientist @ToyotaResearch. PhD from @StanfordMSL. 日本語 & English
California, USA
Joined March 2018
We all love to see our robot policy outperforming baselines. But how do we make sure that the claim is statistically sound, from as few policy rollouts as possible? @das_princeton proposes an effective solution to this fundamental question, which has been accepted at RSS2025!
(1/13) How should we rigorously compare robot policies? Comparison is central to robotics research, but is inherently expensive. We introduce STEP, a flexible and data-efficient method for statistically rigorous policy comparison. Accepted at RSS 2025: https://t.co/MtAMIwlbAn
0
2
10
Highly recommended — a tremendous amount of effort to rigorously test an assumption often taken for granted by the community: "Does a multi-task pretrained vision-language policy actually outperform single-task policies?"
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: https://t.co/n0qmDRivRH One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the
0
2
45
have been waiting for this release! Robotics needs rigorous and careful evaluation now more than ever 🦾
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: https://t.co/n0qmDRivRH One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the
1
5
67
Learn more and see this research in action in our latest video: https://t.co/c245bSCC3o Project Page: https://t.co/D0CXeInq1j
#Robotics #AI #LBMs #MachineLearning #TRIresearch #RobotLearning
0
0
1
At @ToyotaResearch, we've been studying how LBMs can help robots learn faster and better. We built a rigorous evaluation pipeline to benchmark LBM performance with statistical confidence. Results suggest that pre-training on hundreds of tasks yields 80% data savings on new tasks.
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: https://t.co/n0qmDRivRH One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the
1
1
24
Happening now at Ronald Tutor Hall, Room 211!
It is TOMORROW! See you at the 1st Workshop on Robot Evaluation for the Real World! We will be hosting a series of invited, spotlight, and lightning talks with diverse perspectives and applications. We will also have a panel and a debate. https://t.co/AEcphiaP6F
0
0
1
It is TOMORROW! See you at the 1st Workshop on Robot Evaluation for the Real World! We will be hosting a series of invited, spotlight, and lightning talks with diverse perspectives and applications. We will also have a panel and a debate. https://t.co/AEcphiaP6F
0
0
5
What makes data “good” for robot learning? We argue: it’s the data that drives closed-loop policy success! Introducing CUPID 💘, a method that curates demonstrations not by "quality" or appearance, but by how they influence policy behavior, using influence functions. (1/6)
6
20
125
We evaluated more than 1000 reasoning LLMs on 12 reasoning-focused benchmarks and made fascinating observations about cross-benchmark comparisons. You can explore all that data yourself on our HuggingFace spaces page. (1/4)
2
19
96
🚀 Excited to introduce SAFE, our work on multitask failure detection for Vision-Language-Action (VLA) models! 🔍 SAFE is a simple yet powerful detector that leans from VLAs’ semantic-rich internal feature space and outputs a scalar score indicating the likelihood of task failure
2
25
125
“As a PHD student, your job is not publishing a paper every quarter. Focus on a problem in deep understanding and solve it in years under the protect of your adviser” from @RussTedrake #RSS2025
20
81
917
It was such a pleasure to give an invited talk in the RSS Workshop on Reliable Robotics: Safety and Security in the Face of GenAI. I learned diverse perspectives on safety and security (and beyond!), and the panel discussion was very thought-provoking too 🤖🤔
0
5
100
We are presenting two papers in the Imitation Learning I session at #RSS25 this evening! Check out the RSS Website for previews! (Talk 3 and 7) https://t.co/3iZoYmXEZ5
0
1
10
Such a cool work and extensive results by @ChenXu26892388 on run-time monitoring and failure detection of pre-trained vision-based policies, without relying on observing countless failure modes apriori.
Introducing FAIL-Detect 🚨: a method to detect policy failures within a rollout without failure data or a priori knowledge of potential failures. Detections are indicated with a red border. 🧵 1/8
0
0
7
来夏の弊社インターンシッププログラムのご紹介です。私のチームの募集では場所はカリフォルニア州Los Altosとなります。100%英語でのコミュニケーションが前提となりますが今回は米国外の学校からも募集を受け付けているそうなので、研究テーマにご興味のある方は是非ご一考ください。
Currently pursuing a Ph.D. in robotics, ML, or related fields, and interested in making black-box policies robust and reliable? Our team at TRI is hiring 2025 summer interns who will work with myself and @MashaItkina on trustworthy learning for robots.
0
0
4
Currently pursuing a Ph.D. in robotics, ML, or related fields, and interested in making black-box policies robust and reliable? Our team at TRI is hiring 2025 summer interns who will work with myself and @MashaItkina on trustworthy learning for robots.
1
3
10
Our team at TRI is hiring a research intern for the summer of 2025! An exciting opportunity to pursue research at the intersection of perception and control, and to deploy models and algorithms on high-performance cars https://t.co/rcPpHvPYHG
0
3
14
在学中にJASSO海外留学支援制度を受給していた関係で現在でも定期的に「状況調査」なるものがあり、その中で「機構に伝えたいこと」というのがあったので、「支援の拡充は良い事と思う。ただ是非文理の枠にとらわれない形で制度化して欲しい」という意見を伝えました。
0
0
3
Check out our open-source STATS package https://t.co/alpkMQtJER if you are a roboticist tasked with quantifying policy performance with success/failure labels, and are wondering how to get the tightest confidence interval estimates out of a small set of policy rollouts.
github.com
Computation of binomial confidence intervals that achieve exact coverage. - TRI-ML/binomial_cis
0
0
1