
Junyi Zhang
@junyi42
Followers
1K
Following
2K
Media
32
Statuses
150
CS Ph.D. Student @Berkeley_AI. B.Eng. @SJTU1896 CS. Working with @GoogleDeepMind, previous @MSFTResearch. Vision, generative model, representation learning.
Joined July 2022
RT @seohong_park: Q-learning is not yet scalable. I wrote a blog post about my thoughts on scalable RL algorithms.….
0
186
0
RT @YutongBAI1002: What would a World Model look like if we start from a real embodied agent acting in the real world?. It has to have: 1)….
0
117
0
RT @KyleSargentAI: FlowMo, our paper on diffusion autoencoders for image tokenization, has been accepted to #ICCV2025! See you in Hawaii!….
0
13
0
I will also be presenting VideoMimic at the Agents in Interaction workshop:. Poster #182–#201 | June 12 (Thu), 11:45–12:15 | ExHall D. @redstone_hong will also give a spotlight talk on VideoMimic on Thu — come check it out!. More details ⬇️.
0
0
11
Just arrived at Nashville for #CVPR25! 🥰. I'll present St4RTrack tomorrow morning (10:30–12:30) at the 4D Vision Workshop, poster #137 in Hall 104 B. Feel free to come and chat!
Introducing St4RTrack!🖖. Simultaneous 4D Reconstruction and Tracking in the world coordinate feed-forwardly, just by changing the meaning of two pointmaps!.
1
6
101
RT @graceluo_: ✨New preprint: Dual-Process Image Generation! We distill *feedback from a VLM* into *feed-forward image generation*, at infe….
0
176
0
RT @tianyuanzhang99: Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative?. Our new paper….
0
75
0
RT @GillmanLab: Ever wish you could turn your video generator into a controllable physics simulator? . We're thrilled to introduce Force Pr….
0
66
0
Very impressive! At we already:. learn from 3rd-person human videos + RL -- for locomotion. Excited to see where this path goes next!
One of our goals is to have Optimus learn straight from internet videos of humans doing tasks. Those are often 3rd person views captured by random cameras etc. We recently had a significant breakthrough along that journey, and can now transfer a big chunk of the learning.
2
22
213
RT @letian_fu: Tired of teleoperating your robots?.We built a way to scale robot datasets without teleop, dynamic simulation, or even robot….
0
76
0
RT @ChungMinKim: Excited to introduce PyRoki ("Python Robot Kinematics"): easier IK, trajectory optimization, motion retargeting. with an….
0
165
0
Humanoids need to perceive the environment in the real world. Using 4D reconstruction techniques, we turn casual human videos into training data for an environment-aware humanoid policy. Super excited to share:
our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy. (w/ @redstone_hong @junyi42 @davidrmcall)
2
18
134
RT @HaochengXiUCB: 🚀 Introducing #SparseVideoGen: 2x speedup in video generation with HunyuanVideo with high pixel-level fidelity (PSNR = 2….
0
56
0
RT @hanwenjiang1: Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D….
0
70
0
RT @HaoranGeng2: 🚀 RoboVerse has been accepted to RSS 2025 and is now live on arXiv: ✨ Also be selected in HuggingF….
0
23
0
RT @_crockwell: Ever wish YouTube had 3D labels?. 🚀Introducing🎥DynPose-100K🎥, an Internet-scale collection of diverse videos annotated with….
0
38
0
@hkz222 @ju_yuanchen @HavenFeng and I are both around the conference. Feel free to talk to us if you are interested in our latest work, St4RTrack!.
0
0
3
I'll also be around the poster of DenseMatcher Friday afternoon at Hall 3+2B #569 with @hkz222!. Check the poster @ju_yuanchen made 👇.
#ICLR2025 Thrilled for our ICLR 2025 Spotlight: DenseMatcher🍌!📍 Hall 3 + Hall 2B #569, Fri 25 Apr, 3-5:30 AM EDT. Meet my awesome collaborators Junzhe, Junyi @junyi42 , Kaizhe @hkz222 & our advisor Huazhe @HarryXu12 to discuss! ☺️
1
2
5
I'll be presenting MonST3R at ICLR! 🇸🇬. Friday 25th, 10am-12:30pm.Hall 3+2B #97. Come by if you are interested!.
MonST3R is accepted by ICLR'25 as Spotlight!. We have also added a fully feed-forward reconstruction mode that runs in real-time for video input (samples at: , check more details here:
2
2
60