
Yonglong Tian
@YonglongT
Followers
3K
Following
315
Media
4
Statuses
91
Research Scientist @OpenAI. Previously Researcher @GoogleDeepMind, PhD @MIT. Opinions are my own.
Boston, MA
Joined June 2019
RT @shobsund: Personal vision tasks–like detecting *your mug*-are hard; they’re data scarce and fine-grained. In our new paper, we show y….
0
59
0
RT @lijie_fan: 🚀 Excited to share our latest work Fluid!. We've developed a scalable autoregressive text-to-image model without VQ. We trai….
0
22
0
We name our Fluid model from 150M upto 10B! Surprisingly, Fluid with only 300M achieves similar FID as prior model with billions of parameters, e.g. Parti-20B. Joint work with @lijie_fan, @TianhongLi6, Siyang Qin, Yuanzhen Li, @jesu9, @MikiRubinstein, @DeqingSun, and Kaiming He.
2
1
21
RT @JiaweiYang118: Very excited to get this out: “DVT: Denoising Vision Transformers”. We've identified and combated those annoying positio….
0
80
0
RT @phillip_isola: Our computer vision textbook is released!. Foundations of Computer Vision.with Antonio Torralba and Bill Freeman.https:/….
0
401
0
Thank you @_akhaliq for featuring our work!.
Denoising Vision Transformers. paper page: identify crucial artifacts in ViTs caused by positional embeddings and propose a two-stage approach to remove these artifacts, which significantly improves the feature quality of different pre-trained ViTs
0
6
41
RT @lijie_fan: 🚀 Is the future of vision models Synthetic? Introducing SynCLR: our new pipeline leveraging LLMs & Text-to-image models to t….
0
41
0
I had the joy of working with Olivier (and Aaron) at DeepMind. My best internship experience. Strongly recommended!.
Thrilled to announce that we have an opening for a Student Researcher to come work with us at @GoogleDeepMind!. If you’re interested in multimodal learning, in-context adaptation, memory-augmented perception, or active learning, do consider applying:
0
0
6
Thank you @_akhaliq for covering our work!.
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency. paper page: Current vision-language generative models rely on expansive corpora of paired image-text data to attain optimal performance and generalization capabilities.
0
4
29
RT @TongzhouWang: Quasimetric RL code is now on GitHub: Instead of deleting 80% of the dev repo, I rewrote the alg….
0
24
0
RT @sangnie: Join us at the WiML Un-Workshop breakout session on "Role of Mentorship and Networking"! Do not miss the chance to talk with l….
0
22
0
RT @Jing36645824: 🎉(1/6) Exciting News:🐑LAMM is online!. ⭐️Features:.① 200k 2D/3D Instruction tuning dataset.② Benchmark on 14 high-level 2….
0
2
0
Our new work led by elegant @xuyilun2 , Mingyang and Xiang.
In diffusion models, samplers are primarily ODE-centric, overlooking slower stochastic methods. However, we show that stochastic sampler can outperform previous samplers on Stable Diffusion, if we use stochasticity correctly!. check out Restart Sampling:
0
1
12
RT @dilipkay: New paper!! We show that pre-training language-image models *solely* on synthetic images from Stable Diffusion can outperform….
0
112
0
RT @dilipkay: New paper! We show how to leverage pre-trained LLMs (ChatGPT, Bard, LLaMa) to rewrite captions, and significantly improve ove….
0
28
0