Hanjung Kim
@KimD0ing
Followers
157
Following
559
Media
6
Statuses
50
Visiting Scholar @nyuniversity | Ph.D. student @ Yonsei University
New York, NY
Joined February 2023
How can we effectively leverage human videos for robot learning by bridging the inherent embodiment gap? We introduce UniSkill, a universal skill representation, a scalable method for learning cross-embodiment skill representations from large-scale in-the-wild video data. 1/n
4
32
190
When @anyazorin and @irmakkguzey open-sourced the RUKA Hand (a low-cost robotic hand) earlier this year, people kept asking us how to get one. Open hardware isnโt as easy to share as code. So weโre releasing an off-the-shelf RUKA, in collaboration with @WowRobo and @zhazhali01.
15
40
238
MLLMs show strong video QA ability, but how can we extend this to the ๐ฉ๐ข๐ฑ๐๐ฅ-๐ฅ๐๐ฏ๐๐ฅ ๐ฎ๐ง๐๐๐ซ๐ฌ๐ญ๐๐ง๐๐ข๐ง๐ ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐๐ง๐ฒ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ ? We introduce Decomposed Attention Fusion (DecAF) in MLLMs for Training-Free Video Reasoning Segmentation.
1
4
11
Excited for CoRL! Heading from NYC back to Korea ๐ฐ๐ท I'll present UniSkill at the H2R workshop (Sat) and Poster Session (Mon) ๐ค If you're into Learning from Human Video or Latent Action Modeling, I'd be happy to chat!
How can we effectively leverage human videos for robot learning by bridging the inherent embodiment gap? We introduce UniSkill, a universal skill representation, a scalable method for learning cross-embodiment skill representations from large-scale in-the-wild video data. 1/n
1
0
11
Unitree G1 has mastered more quirky skills ๐คฉ Unitree G1 has learned the "Anti-Gravity" mode: stability is greatly improved under any action sequence, and even if it falls, it can quickly get back up.
2K
3K
17K
Coming from a computer vision background and now in sequence modeling, Iโm often struck by how disconnected LLMs and vision feel. Our work, AUSM, treats video as language -- and it reveals a few blind spots weโve overlooked.
We connect the autoregressive pipeline of LLMs with streaming video perception. Introducing AUSM: Autoregressive Universal Video Segmentation Model. A step toward unified, scalable video perception โ inspired by how LLMs unified NLP. ๐
4
8
135
We connect the autoregressive pipeline of LLMs with streaming video perception. Introducing AUSM: Autoregressive Universal Video Segmentation Model. A step toward unified, scalable video perception โ inspired by how LLMs unified NLP. ๐
arxiv.org
Recent video foundation models such as SAM2 excel at prompted video segmentation by treating masks as a general-purpose primitive. However, many real-world settings require unprompted segmentation...
2
28
142
Learn robot skills from human YouTube videos, withoutโฆ labels, poses, or human-robot alignment: ๐ Bookmarked by researchers pushing the limits of cross-embodiment learning. UniSkill is a new framework for learning embodiment-agnostic skill representations from large-scale,
0
5
42
Happy to announce our paper, UniSkill, is accepted to CoRL 2025! ๐ค Learning from human videos is the future of robot learning, but the cross-embodiment gap has been a major barrier. We introduce a simple yet powerful way to bridge this gap. Looking forward to see in Korea!๐ฐ๐ท
How can we effectively leverage human videos for robot learning by bridging the inherent embodiment gap? We introduce UniSkill, a universal skill representation, a scalable method for learning cross-embodiment skill representations from large-scale in-the-wild video data. 1/n
5
7
101
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
98
754
5K
๐๏ธ ๐๐จ๐ฎ๐๐ฅ๐ ๐ญ๐ก๐ ๐๐ฉ๐๐๐, ๐๐๐ซ๐จ ๐๐ซ๐๐ข๐ง๐ข๐ง๐ : ๐๐ก๐ ๐
๐ซ๐๐ ๐๐ฎ๐ง๐๐ก ๐๐จ๐ซ ๐๐ข๐๐๐จ ๐๐๐๐ฌ! โก๏ธ ๐จI am excited to share that our paper is accepted #ICCV2025 @ICCVConference ArXiv paper: https://t.co/8dnqmHoAsm Project page:
jshyun.me
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Jeongseok Hyun1 Sukjun Hwang2 Su Ho Han1 Taeoh Kim3 Inwoong Lee3 Dongyoon Wee3 Joon-Young Lee4 Seon Joo Kim1...
1
6
16
Generalization needs data. But data collection is hard for precise tasks like plugging USBs, swiping cards, inserting plugs, and keying locks. Introducing robust, precise VisuoTactile Local (ViTaL) policies: >90% success rates from just 30 demos and 45 min of real-world RL.๐งถโฌ๏ธ
5
29
228
Tactile sensing is gaining traction, but slowly. Why? Because integration remains difficult. But what if adding touch sensors to your robot was as easy as hitting โprintโ? Introducing eFlesh: a 3D-printable, customizable tactile sensor. Shape it. Size it. Print it. ๐งถ๐
21
102
831
It was nice engaging with the CV community on ways to stand out in the crowd. My answer was simple: work on robotics. There are so many unanswered problems and open pastures for research if you are a new researcher. Below are 6 problems I focussed on in my talk.
In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: https://t.co/1fKzplQrU5 We have an exciting lineup of invited talks and candid
2
26
166
We just open-sourced EgoZero! It includes the full preprocessing to turn long-form recordings into individual demonstrations as 3D states + actions. We engineered this for scalability to big datasets (streaming, parallel workers, CPU/GPU utilization)
1
3
15
Everyday human data is roboticsโ answer to internet-scale tokens. But how can robots learn to feelโjust from videos?๐น Introducing FeelTheForce (FTF): force-sensitive manipulation policies learned from natural human interactions๐๏ธ๐ค ๐ https://t.co/CZcG87xYn5 1/n
11
39
220
Teaching robots to learn only from RGB human videos is hard! In Feel The Force (FTF), we teach robots to mimic the tactile feedback humans experience when handling objects. This allows for delicate, touch-sensitive tasksโlike picking up a raw egg without breaking it. ๐งต๐
18
86
540
UniSkill is accepted at @CVPR Agents in Interactions, from Humans to Robots workshop. I'll be attending CVPRโwould like to connect and chat with folks in the robotics. Feel free to ping me!
How can we effectively leverage human videos for robot learning by bridging the inherent embodiment gap? We introduce UniSkill, a universal skill representation, a scalable method for learning cross-embodiment skill representations from large-scale in-the-wild video data. 1/n
0
2
32