Hanjung Kim Profile
Hanjung Kim

@KimD0ing

Followers
157
Following
559
Media
6
Statuses
50

Visiting Scholar @nyuniversity | Ph.D. student @ Yonsei University

New York, NY
Joined February 2023
Don't wanna be here? Send us removal request.
@KimD0ing
Hanjung Kim
6 months
How can we effectively leverage human videos for robot learning by bridging the inherent embodiment gap? We introduce UniSkill, a universal skill representation, a scalable method for learning cross-embodiment skill representations from large-scale in-the-wild video data. 1/n
4
32
190
@Raunaqmb
Raunaq Bhirangi
3 days
When @anyazorin and @irmakkguzey open-sourced the RUKA Hand (a low-cost robotic hand) earlier this year, people kept asking us how to get one. Open hardware isnโ€™t as easy to share as code. So weโ€™re releasing an off-the-shelf RUKA, in collaboration with @WowRobo and @zhazhali01.
15
40
238
@1x_tech
1X
17 days
NEO The Home Robot Order Today
7K
10K
70K
@Jeongseok_hyun
Jeongseok Hyun
22 days
MLLMs show strong video QA ability, but how can we extend this to the ๐ฉ๐ข๐ฑ๐ž๐ฅ-๐ฅ๐ž๐ฏ๐ž๐ฅ ๐ฎ๐ง๐๐ž๐ซ๐ฌ๐ญ๐š๐ง๐๐ข๐ง๐  ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐š๐ง๐ฒ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐ ? We introduce Decomposed Attention Fusion (DecAF) in MLLMs for Training-Free Video Reasoning Segmentation.
1
4
11
@adcock_brett
Brett Adcock
1 month
Introducing Figure 03 https://t.co/cgJtQkYb3p
648
1K
7K
@KimD0ing
Hanjung Kim
2 months
Excited for CoRL! Heading from NYC back to Korea ๐Ÿ‡ฐ๐Ÿ‡ท I'll present UniSkill at the H2R workshop (Sat) and Poster Session (Mon) ๐Ÿค– If you're into Learning from Human Video or Latent Action Modeling, I'd be happy to chat!
@KimD0ing
Hanjung Kim
6 months
How can we effectively leverage human videos for robot learning by bridging the inherent embodiment gap? We introduce UniSkill, a universal skill representation, a scalable method for learning cross-embodiment skill representations from large-scale in-the-wild video data. 1/n
1
0
11
@UnitreeRobotics
Unitree
2 months
Unitree G1 has mastered more quirky skills ๐Ÿคฉ Unitree G1 has learned the "Anti-Gravity" mode: stability is greatly improved under any action sequence, and even if it falls, it can quickly get back up.
2K
3K
17K
@sukjun_hwang
Sukjun (June) Hwang
2 months
Coming from a computer vision background and now in sequence modeling, Iโ€™m often struck by how disconnected LLMs and vision feel. Our work, AUSM, treats video as language -- and it reveals a few blind spots weโ€™ve overlooked.
@miran_heo
Miran Heo
2 months
We connect the autoregressive pipeline of LLMs with streaming video perception. Introducing AUSM: Autoregressive Universal Video Segmentation Model. A step toward unified, scalable video perception โ€” inspired by how LLMs unified NLP. ๐Ÿ“
4
8
135
@miran_heo
Miran Heo
2 months
We connect the autoregressive pipeline of LLMs with streaming video perception. Introducing AUSM: Autoregressive Universal Video Segmentation Model. A step toward unified, scalable video perception โ€” inspired by how LLMs unified NLP. ๐Ÿ“
Tweet card summary image
arxiv.org
Recent video foundation models such as SAM2 excel at prompted video segmentation by treating masks as a general-purpose primitive. However, many real-world settings require unprompted segmentation...
2
28
142
@IlirAliu_
Ilir Aliu - eu/acc
3 months
Learn robot skills from human YouTube videos, withoutโ€ฆ labels, poses, or human-robot alignment: ๐Ÿ“Œ Bookmarked by researchers pushing the limits of cross-embodiment learning. UniSkill is a new framework for learning embodiment-agnostic skill representations from large-scale,
0
5
42
@KimD0ing
Hanjung Kim
4 months
Happy to announce our paper, UniSkill, is accepted to CoRL 2025! ๐Ÿค– Learning from human videos is the future of robot learning, but the cross-embodiment gap has been a major barrier. We introduce a simple yet powerful way to bridge this gap. Looking forward to see in Korea!๐Ÿ‡ฐ๐Ÿ‡ท
@KimD0ing
Hanjung Kim
6 months
How can we effectively leverage human videos for robot learning by bridging the inherent embodiment gap? We introduce UniSkill, a universal skill representation, a scalable method for learning cross-embodiment skill representations from large-scale in-the-wild video data. 1/n
5
7
101
@sukjun_hwang
Sukjun (June) Hwang
4 months
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
98
754
5K
@Jeongseok_hyun
Jeongseok Hyun
4 months
๐ŸŽž๏ธ ๐ƒ๐จ๐ฎ๐›๐ฅ๐ž ๐ญ๐ก๐ž ๐’๐ฉ๐ž๐ž๐, ๐™๐ž๐ซ๐จ ๐“๐ซ๐š๐ข๐ง๐ข๐ง๐ : ๐“๐ก๐ž ๐…๐ซ๐ž๐ž ๐‹๐ฎ๐ง๐œ๐ก ๐Ÿ๐จ๐ซ ๐•๐ข๐๐ž๐จ ๐‹๐‹๐Œ๐ฌ! โšก๏ธ ๐ŸšจI am excited to share that our paper is accepted #ICCV2025 @ICCVConference ArXiv paper: https://t.co/8dnqmHoAsm Project page:
jshyun.me
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Jeongseok Hyun1 Sukjun Hwang2 Su Ho Han1 Taeoh Kim3 Inwoong Lee3 Dongyoon Wee3 Joon-Young Lee4 Seon Joo Kim1...
1
6
16
@Raunaqmb
Raunaq Bhirangi
5 months
Generalization needs data. But data collection is hard for precise tasks like plugging USBs, swiping cards, inserting plugs, and keying locks. Introducing robust, precise VisuoTactile Local (ViTaL) policies: >90% success rates from just 30 demos and 45 min of real-world RL.๐Ÿงถโฌ‡๏ธ
5
29
228
@Raunaqmb
Raunaq Bhirangi
5 months
Tactile sensing is gaining traction, but slowly. Why? Because integration remains difficult. But what if adding touch sensors to your robot was as easy as hitting โ€œprintโ€? Introducing eFlesh: a 3D-printable, customizable tactile sensor. Shape it. Size it. Print it. ๐Ÿงถ๐Ÿ‘‡
21
102
831
@LerrelPinto
Lerrel Pinto
5 months
It was nice engaging with the CV community on ways to stand out in the crowd. My answer was simple: work on robotics. There are so many unanswered problems and open pastures for research if you are a new researcher. Below are 6 problems I focussed on in my talk.
@anand_bhattad
Anand Bhattad
5 months
In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: https://t.co/1fKzplQrU5 We have an exciting lineup of invited talks and candid
2
26
166
@notmahi
Mahi Shafiullah ๐Ÿ ๐Ÿค–
5 months
Live demo-ing RUMs at @CVPR this afternoon next to the expo sessions โ€“ stop by with something small and letโ€™s see if the robot can pick it up zero shot! #CVPR2025
@LerrelPinto
Lerrel Pinto
8 months
When life gives you lemons, you pick them up.
0
6
27
@vincentjliu
Vincent Liu
5 months
We just open-sourced EgoZero! It includes the full preprocessing to turn long-form recordings into individual demonstrations as 3D states + actions. We engineered this for scalability to big datasets (streaming, parallel workers, CPU/GPU utilization)
1
3
15
@AdemiAdeniji
Ademi Adeniji
5 months
Everyday human data is roboticsโ€™ answer to internet-scale tokens. But how can robots learn to feelโ€”just from videos?๐Ÿ“น Introducing FeelTheForce (FTF): force-sensitive manipulation policies learned from natural human interactions๐Ÿ–๏ธ๐Ÿค– ๐Ÿ‘‰ https://t.co/CZcG87xYn5 1/n
11
39
220
@LerrelPinto
Lerrel Pinto
5 months
Teaching robots to learn only from RGB human videos is hard! In Feel The Force (FTF), we teach robots to mimic the tactile feedback humans experience when handling objects. This allows for delicate, touch-sensitive tasksโ€”like picking up a raw egg without breaking it. ๐Ÿงต๐Ÿ‘‡
18
86
540
@KimD0ing
Hanjung Kim
5 months
UniSkill is accepted at @CVPR Agents in Interactions, from Humans to Robots workshop. I'll be attending CVPRโ€”would like to connect and chat with folks in the robotics. Feel free to ping me!
@KimD0ing
Hanjung Kim
6 months
How can we effectively leverage human videos for robot learning by bridging the inherent embodiment gap? We introduce UniSkill, a universal skill representation, a scalable method for learning cross-embodiment skill representations from large-scale in-the-wild video data. 1/n
0
2
32