imp_aa Profile Banner
Haruki Nishimura Profile
Haruki Nishimura

@imp_aa

Followers
654
Following
1K
Media
38
Statuses
480

Learning and planning for safe, embodied autonomous systems under uncertainty. Senior Research Scientist @ToyotaResearch. PhD from @StanfordMSL. 日本語 & English

California, USA
Joined March 2018
Don't wanna be here? Send us removal request.
@imp_aa
Haruki Nishimura
4 months
We all love to see our robot policy outperforming baselines. But how do we make sure that the claim is statistically sound, from as few policy rollouts as possible? @das_princeton proposes an effective solution to this fundamental question, which has been accepted at RSS2025!
@das_princeton
David Snyder
4 months
(1/13) How should we rigorously compare robot policies? Comparison is central to robotics research, but is inherently expensive. We introduce STEP, a flexible and data-efficient method for statistically rigorous policy comparison. Accepted at RSS 2025: https://t.co/MtAMIwlbAn
0
2
10
@TairanHe99
Tairan He
2 months
Highly recommended — a tremendous amount of effort to rigorously test an assumption often taken for granted by the community: "Does a multi-task pretrained vision-language policy actually outperform single-task policies?"
@RussTedrake
Russ Tedrake
2 months
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: https://t.co/n0qmDRivRH One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the
0
2
45
@SongShuran
Shuran Song
2 months
have been waiting for this release! Robotics needs rigorous and careful evaluation now more than ever 🦾
@RussTedrake
Russ Tedrake
2 months
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: https://t.co/n0qmDRivRH One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the
1
5
67
@imp_aa
Haruki Nishimura
2 months
Learn more and see this research in action in our latest video: https://t.co/c245bSCC3o Project Page: https://t.co/D0CXeInq1j #Robotics #AI #LBMs #MachineLearning #TRIresearch #RobotLearning
0
0
1
@imp_aa
Haruki Nishimura
2 months
At @ToyotaResearch, we've been studying how LBMs can help robots learn faster and better. We built a rigorous evaluation pipeline to benchmark LBM performance with statistical confidence. Results suggest that pre-training on hundreds of tasks yields 80% data savings on new tasks.
@RussTedrake
Russ Tedrake
2 months
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: https://t.co/n0qmDRivRH One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the
1
1
24
@imp_aa
Haruki Nishimura
3 months
Happening now at Ronald Tutor Hall, Room 211!
@imp_aa
Haruki Nishimura
3 months
It is TOMORROW! See you at the 1st Workshop on Robot Evaluation for the Real World! We will be hosting a series of invited, spotlight, and lightning talks with diverse perspectives and applications. We will also have a panel and a debate. https://t.co/AEcphiaP6F
Tweet media one
0
0
1
@imp_aa
Haruki Nishimura
3 months
It is TOMORROW! See you at the 1st Workshop on Robot Evaluation for the Real World! We will be hosting a series of invited, spotlight, and lightning talks with diverse perspectives and applications. We will also have a panel and a debate. https://t.co/AEcphiaP6F
Tweet media one
0
0
5
@agiachris
Christopher Agia
3 months
What makes data “good” for robot learning? We argue: it’s the data that drives closed-loop policy success! Introducing CUPID 💘, a method that curates demonstrations not by "quality" or appearance, but by how they influence policy behavior, using influence functions. (1/6)
6
20
125
@MercatJean
Jean Mercat
3 months
We evaluated more than 1000 reasoning LLMs on 12 reasoning-focused benchmarks and made fascinating observations about cross-benchmark comparisons. You can explore all that data yourself on our HuggingFace spaces page. (1/4)
Tweet media one
2
19
96
@qiaogu1997
Qiao Gu
3 months
🚀 Excited to introduce SAFE, our work on multitask failure detection for Vision-Language-Action (VLA) models! 🔍 SAFE is a simple yet powerful detector that leans from VLAs’ semantic-rich internal feature space and outputs a scalar score indicating the likelihood of task failure
2
25
125
@YuXiang_IRVL
Yu Xiang
3 months
“As a PHD student, your job is not publishing a paper every quarter. Focus on a problem in deep understanding and solve it in years under the protect of your adviser” from @RussTedrake #RSS2025
Tweet media one
20
81
917
@imp_aa
Haruki Nishimura
3 months
It was such a pleasure to give an invited talk in the RSS Workshop on Reliable Robotics: Safety and Security in the Face of GenAI. I learned diverse perspectives on safety and security (and beyond!), and the panel discussion was very thought-provoking too 🤖🤔
Tweet media one
0
5
100
@imp_aa
Haruki Nishimura
3 months
We are presenting two papers in the Imitation Learning I session at #RSS25 this evening! Check out the RSS Website for previews! (Talk 3 and 7) https://t.co/3iZoYmXEZ5
0
1
10
@imp_aa
Haruki Nishimura
6 months
Such a cool work and extensive results by @ChenXu26892388 on run-time monitoring and failure detection of pre-trained vision-based policies, without relying on observing countless failure modes apriori.
@ChenXu26892388
Chen Xu
6 months
Introducing FAIL-Detect 🚨: a method to detect policy failures within a rollout without failure data or a priori knowledge of potential failures. Detections are indicated with a red border. 🧵 1/8
0
0
7
@imp_aa
Haruki Nishimura
10 months
This is a research internship role.
0
0
0
@imp_aa
Haruki Nishimura
10 months
来夏の弊社インターンシッププログラムのご紹介です。私のチームの募集では場所はカリフォルニア州Los Altosとなります。100%英語でのコミュニケーションが前提となりますが今回は米国外の学校からも募集を受け付けているそうなので、研究テーマにご興味のある方は是非ご一考ください。
@imp_aa
Haruki Nishimura
10 months
Currently pursuing a Ph.D. in robotics, ML, or related fields, and interested in making black-box policies robust and reliable? Our team at TRI is hiring 2025 summer interns who will work with myself and @MashaItkina on trustworthy learning for robots.
0
0
4
@imp_aa
Haruki Nishimura
10 months
Currently pursuing a Ph.D. in robotics, ML, or related fields, and interested in making black-box policies robust and reliable? Our team at TRI is hiring 2025 summer interns who will work with myself and @MashaItkina on trustworthy learning for robots.
1
3
10
@thomas__lew
Thomas Lew
10 months
Our team at TRI is hiring a research intern for the summer of 2025! An exciting opportunity to pursue research at the intersection of perception and control, and to deploy models and algorithms on high-performance cars https://t.co/rcPpHvPYHG
0
3
14
@imp_aa
Haruki Nishimura
1 year
在学中にJASSO海外留学支援制度を受給していた関係で現在でも定期的に「状況調査」なるものがあり、その中で「機構に伝えたいこと」というのがあったので、「支援の拡充は良い事と思う。ただ是非文理の枠にとらわれない形で制度化して欲しい」という意見を伝えました。
0
0
3
@imp_aa
Haruki Nishimura
1 year
Check out our open-source STATS package https://t.co/alpkMQtJER if you are a roboticist tasked with quantifying policy performance with success/failure labels, and are wondering how to get the tightest confidence interval estimates out of a small set of policy rollouts.
Tweet card summary image
github.com
Computation of binomial confidence intervals that achieve exact coverage. - TRI-ML/binomial_cis
0
0
1