tyleryzhu Profile Banner
Tyler Zhu Profile
Tyler Zhu

@tyleryzhu

Followers
2K
Following
64K
Media
71
Statuses
1K

PhD student @VisualAILab | SR @GoogleDeepMind | prev @berkeley_ai | @SFGiants @warriors guy

Berkeley, CA
Joined March 2020
Don't wanna be here? Send us removal request.
@tyleryzhu
Tyler Zhu
11 days
Today seems to be a fitting day for @GoogleDeepMind news, so I'm excited to announce our new preprint! Prior work suggests that text & img repr's are converging, albeit weakly. We found these same models actually have strong alignment; the inputs were too impoverished to see it!
11
25
132
@m__dehghani
Mostafa Dehghani
10 days
Thinking (test-time compute) in pixel space... 🍌 Pro tip: always peek at the thoughts if you use AI Studio. Watching the model think in pictures is really fun!
21
81
697
@tonyzzhao
Tony Zhao
10 days
Today, we present a step-change in robotic AI @sundayrobotics. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity đź§µ->
425
655
5K
@tyleryzhu
Tyler Zhu
11 days
In moving from static images and text to dynamic videos and text descriptions, we better reflect Plato's vision of perception which is grounded in reality, not merely shadows on the cave. This is a step towards that, but there are still many unanswered Qs (eg, generative models?)
0
0
5
@tyleryzhu
Tyler Zhu
11 days
Finally, we show that this alignment, despite being a "semantic" metric, is promising as a zero-shot video probe of downstream tasks. There is a strong pos. correlation w/ semantic tasks like action class., but also geometric ones like obj tracking and depth estimation!
1
0
3
@tyleryzhu
Tyler Zhu
11 days
We create Chinchilla-style scaling laws to quantify this scaling behavior. The coefficients relay the same information as above, i.e. the coefficients indicate maximum penalty for a poor approximation, and the exponents are how fast you're able to incorporate new data.
1
0
3
@tyleryzhu
Tyler Zhu
11 days
This only gets better as we scale along both axes of frames and captions. While both VideoMAEv2 and DINOv2 get much better with more captions, VideoMAEv2 uses video info better. In total, we nearly double alignment from what a single image & caption can offer by matching reality
1
0
3
@tyleryzhu
Tyler Zhu
11 days
We benchmark 121 models+variants in total. On the same setting as the original PRH, we reproduce that image models are at best only ~20% aligned w/ SoTA LLMs (Gemma) using a single caption. Native video models instead have both the best repr's (retrieval) and alignment (25%!)
1
0
3
@tyleryzhu
Tyler Zhu
11 days
The key to this is using videos and multiple captions, which more accurately reflect the true underlying scenes. We use both the VaTeX dataset as well as PVD w/ synthesized captions, and we sample varying amounts of visual/text info to understand their relationship better.
1
0
3
@tyleryzhu
Tyler Zhu
11 days
However, their original study found that alignment b/w image models and LLMs capped at 0.16. Does this mean 0.16 is strong alignment, or that there is still a strong gap b/w models? We found that by moving to dynamic inputs (and video models), we could achieve scores of 0.40!
1
0
6
@tyleryzhu
Tyler Zhu
11 days
The "Platonic Representation Hypothesis," by @phillip_isola and co., posited that different NNs trained on different data and modalities, i.e. large ViTs and LLMs, are converging to a shared model of reality. This makes sense, as all data is a projection of a shared reality!
1
0
8
@tyleryzhu
Tyler Zhu
11 days
arXiv page: https://t.co/NsGdjZW4qb Project overview page: https://t.co/Kk24V0Q3AZ hf page: https://t.co/akV6QkH7kY Code coming soon! Work done at Google DeepMind, and in collaboration with the fantastic team of @TengdaHan, Leo Guibas, Viorica Patraucean, and Maks Ovsjanikov.
Tweet card summary image
huggingface.co
1
0
9
@tyleryzhu
Tyler Zhu
18 days
Ritwik is a great mentor and figures to be an even better advisor. You’re missing out if you have shared interests and don’t apply!
@Ritwik_G
Ritwik Gupta 🇺🇦
18 days
I am recruiting Ph.D. students at @umdcs starting Fall 2026! I am looking for students in three broad areas: (1) Physics-integrated computer vision (2) VLMs with constraints (2) Dual-use AI policy We're ranked #3 in AI on @CSrankings! Specific details in đź§µ
0
0
4
@YangWilliam_
William Yang
30 days
Text-to-image (T2I) models can generate rich supervision for visual learning but generating subtle distinctions still remains challenging. Fine-tuning helps, but too much tuning → overfitting and loss of diversity. How do we preserve fidelity without sacrificing diversity (1/8)
2
13
39
@gu_xiangming
Xiangming Gu
1 month
Last Friday, I wrapped up my 24-week Student Researcher role at @GoogleDeepMind in London. I’m deeply thankful to my hosts @PetarV_93 and @re_rayne for their guidance, and to all the brilliant minds at @GoogleDeepMind for their inspiration and collaboration. I’ve also had a lot
5
7
193
@fedzbar
Federico Barbero
1 month
🚨🌶️ Did you realise you can get alignment `training’ data out of open weights models? Oops We show that models will regurgitate alignment data that is (semantically) memorised. This data can come from SFT and RL... and can be used to train your own models! 🧵
10
42
241
@tyleryzhu
Tyler Zhu
1 month
@farhadi @AlisonGopnik @ICCVConference @eunice_yiu_ @phillip_isola Last but not least, @sammtmd is closing off by telling us about the very exciting and timely Physics-IQ benchmark track and its contestants
0
0
4
@tyleryzhu
Tyler Zhu
1 month
@farhadi @AlisonGopnik @ICCVConference @eunice_yiu_ @phillip_isola is now giving a talk on symbol grounding in multimodal AI! Floor 4 Ballroom B
1
0
2
@phillip_isola
Phillip Isola
1 month
If you are at ICCV, I'm giving a talk here at 3:30pm in Ballroom B on "Revisiting the symbol grounding problem in the age of multi-modal AI" Will cover recent work on multimodal rep alignment in unpaired and unimodal models.
@shiryginosar
Shiry Ginosar
1 month
Join us TODAY for the 3rd Perception Test Challenge https://t.co/DVHQFjkyuA @ICCV2025! Ballroom B, Full day Amazing lineup of speakers: @farhadi, @AlisonGopnik, Phlipp Krahenbul, @phillip_isola
4
13
82
@tyleryzhu
Tyler Zhu
1 month
@farhadi @AlisonGopnik @ICCVConference @eunice_yiu_ is presenting the Kiva track now, along with the winning team presentations!
1
0
1