Haotian Tang
@haotiant1998
Followers
2K
Following
252
Media
4
Statuses
100
Research Scientist @Meta. Previously Gemini team @GoogleDeepMind, Ph.D. @MITEECS, B.Eng. @sjtu1896.
Joined September 2021
Personal update: I am excited to share that I will join @GoogleDeepMind next week after defending my PhD thesis @MITEECS earlier last month. I will be working on generative models that simulate the physical world. Looking forward to the new journey ahead in 2025!
73
53
2K
Explore Eigen Banana, out post trained image edit model with lightning fast speed! ⚡️
🚀 Releasing open-source Eigen-Banana-Qwen-Image-Edit: 4 seconds ⚡ instruction-based image edits trained on Pico-Banana-400K. Super fast with high image editing quality. Open-source LoRA for Diffusers/DiffSynth-Studio + enterprise stack (EigenTrain/Inference/Deploy). Feel free
0
2
14
Very cool research!
Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch
0
0
3
So excited to see Tong’s amazing work! Let’s gooooo 🚀
Watch Gemini 2.5 Pro Deep Think tackle the challenging "catch a mole" problem from @Codeforces. 🪤 This new mode is based on our research in parallel thinking and considers multiple hypotheses before responding. See it in action ↓
0
0
3
Deep Think in 2.5 Pro has landed. 🤯 It’s a new enhanced reasoning mode using our research in parallel thinking techniques - meaning it explores multiple hypotheses before responding. This enables it to handle incredibly complex math and coding problems more effectively.
72
422
4K
Check out Veo 3 🔥🔥🔥 sound on 🔊
Say goodbye to the silent era of video generation: Introducing Veo 3 — with native audio generation. 🗣️ Quality is up from Veo 2, and now you can add dialogue between characters, sound effects and background noise. Veo 3 is available now in the @GeminiApp for Google AI Ultra
6
7
157
A lot of work went to make Gemini 2.5 SOTA at video understanding, check out this 🧵 for more details! Looking back at where we were a year ago, the progress really feels phenomenal! So many things to unlock and enable from video 🎥 and we are only getting started!
Thrilled to share our latest advances in video understanding 📽️: Gemini 2.5 Pro is a truly magical model to play with, excelling in traditional video analysis and unlocking new use cases I could not imagine a few months ago🪄 More in 🧵 and @Google blog:
5
11
148
Gemini 2.5 Pro (05-06) is SOTA at most video understanding tasks (by a large margin) 📽️. Lots of work by the Gemini multimodal team to make this happen, excited to see developers push this capability in new ways. More details below!
116
161
2K
♊️
What a finish! Gemini 2.5 Pro just completed Pokémon Blue!  Special thanks to @TheCodeOfJoel for creating and running the livestream, and to everyone who cheered Gem on along the way.
0
0
4
1/ Today, Veo 2, our state-of-the-art video model, is rolling out to Gemini Advanced + Whisk! You can create 8s, high-res videos from text prompts in @GeminiApp with fluid character movement + lifelike scenes across a range of styles. Tip: the more detailed your description, the
143
340
3K
2.5 Pro is the highest performing model for Aider Polyglot (real-world coding) and has a lower cost than the five next-best models. An amazing model for code 💎
Gemini 2.5 Pro's leaderboard entry has been updated with costs, now that it available through a paid API. It cost $6 to run the aider polyglot coding benchmark on Gemini, lower than the top 10 other entries except for DeepSeek's models. https://t.co/mBVaUPGHPl
4
15
190
In case it's not clear from @paulgauthier's chart below, the cost differences are quite large among the top 10 models on this benchmark, w/ some (lower quality) models being ~2X, ~3X or ~30X more expensive than the Gemini 2.5 Pro model (the website has a nice table, seen below).
Gemini 2.5 Pro's leaderboard entry has been updated with costs, now that it available through a paid API. It cost $6 to run the aider polyglot coding benchmark on Gemini, lower than the top 10 other entries except for DeepSeek's models. https://t.co/mBVaUPGHPl
43
113
971
What an amazing chip! Cannot wait to try it out
Introducing Ironwood, the first TPU built for the age of inference, and the timing could not be better : ) - Ironwood perf/watt is 2x relative to Trillium, 6th gen TPU - Ironwood offers 192 GB per chip, 6x that of Trillium - 4.5x faster data access https://t.co/doEUgLLgRf
1
0
1
Deep Research in the Gemini App is now powered by Gemini 2.5 Pro, and our early tests show users prefer this 2:1 vs “other products” ;) https://t.co/O3Nv1uXPnK
gemini.google.com
Meet Gemini, Google’s AI assistant. Get help with writing, planning, brainstorming, and more. Experience the power of generative AI.
201
199
3K
Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →
91
520
3K
Try out Veo 2 in YouTube! Congrats Veo team 🎉
🎥 Our state-of-the-art video generation model Veo 2 is now available in @YouTube Shorts. With the Dream Screen feature, creators can: ✨ Produce new clips that fit seamlessly into their storytelling with a quick text prompt ✨ Use it to make backgrounds for their videos.
6
7
109
Breaking news from Text-to-Image Arena! 🖼️✨ @GoogleDeepMind’s Imagen 3 debuts at #1, surpassing Recraft-v3 with a remarkable +70-point lead! Congrats to the Google Imagen team for setting a new bar! Try the best text2image at LMArena and cast your vote! More analysis👇
47
141
857
What an achievement! Congrats to the team!
Our latest update to our Gemini 2.0 Flash Thinking model (available here: https://t.co/Rr9DvqbUdO) scores 73.3% on AIME (math) & 74.2% on GPQA Diamond (science) benchmarks. Thanks for all your feedback, this represents super fast progress from our first release just this past
0
0
4
DeepMind has ambitious plans to make massive generative models that simulate the world. I'm hiring for a new team with this mission. Come build with us! https://t.co/pqvALtAvLs
90
217
2K
Thrilled that my citations hit 20,000 on the last day of 2024! Was just 19,980+ yesterday - what a lovely surprise! This year brought changes: first job switch, moved from HK to US, and did amazing projects - e.g. SANA - with interns. Here's to 2025!
7
3
147