Quoc Le Profile
Quoc Le

@quocleix

Followers
54K
Following
806
Media
70
Statuses
451

Distinguished Scientist, Google.

Joined April 2018
Don't wanna be here? Send us removal request.
@quocleix
Quoc Le
3 years
New open-source language model from Google AI: Flan-T5 🍮. Flan-T5 is instruction-finetuned on 1,800+ language tasks, leading to dramatically improved prompting and multi-step reasoning abilities. Public models: Paper:
Tweet media one
37
474
2K
@quocleix
Quoc Le
6 years
EfficientNets: a family of more efficient & accurate image classification models. Found by architecture search and scaled up by one weird trick. Link: . Github: . Blog:
Tweet media one
24
658
2K
@quocleix
Quoc Le
5 years
Fun AutoML-Zero experiments: Evolutionary search discovers fundamental ML algorithms from scratch, e.g., small neural nets with backprop. Can evolution be the “Master Algorithm”? ;) . Paper: Code:
20
536
2K
@quocleix
Quoc Le
6 years
XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE). arxiv: github (code + pretrained models): with Zhilin Yang, @ZihangDai, Yiming Yang, Jaime Carbonell, @rsalakhu
Tweet media one
19
680
2K
@quocleix
Quoc Le
5 years
New paper: Towards a Human-like Open-Domain Chatbot. Key takeaways:. 1. "Perplexity is all a chatbot needs" ;).2. We're getting closer to a high-quality chatbot that can chat about anything. Paper: Blog:
Tweet media one
Tweet media two
27
561
2K
@quocleix
Quoc Le
6 years
Want to improve accuracy and robustness of your model? Use unlabeled data!. Our new work uses self-training on unlabeled data to achieve 87.4% top-1 on ImageNet, 1% better than SOTA. Huge gains are seen on harder benchmarks (ImageNet-A, C and P). Link:
Tweet media one
20
418
1K
@quocleix
Quoc Le
4 years
Some nice improvement on ImageNet: 90% top-1 accuracy has been achieved :-). This result is possible by using Meta Pseudo Labels, a semi-supervised learning method, to train EfficientNet-L2. More details here:
Tweet media one
16
300
1K
@quocleix
Quoc Le
5 years
A surprising result: We found that smooth activation functions are better than ReLU for adversarial training and can lead to substantial improvements in adversarial robustness.
Tweet media one
20
248
1K
@quocleix
Quoc Le
5 years
Happy to announce that we've released a number of models trained with Noisy Student (a semi-supervised learning method). The best model achieves 88.4% top-1 accuracy on ImageNet (SOTA). Enjoy finetuning!. Link: Paper:
Tweet media one
11
276
1K
@quocleix
Quoc Le
6 years
Nice blog post titled "The Quiet Semi-Supervised Revolution" by Vincent Vanhoucke. It discusses two related works by the Google Brain team: Unsupervised Data Augmentation and MixMatch.
Tweet media one
4
288
953
@quocleix
Quoc Le
5 years
We researchers love pre-training. Our new paper shows that pre-training is unhelpful when we have a lot of labeled data. In contrast, self-training works well even when we have a lot of labeled data. SOTA on PASCAL segmentation & COCO detection. Link:
Tweet media one
6
225
953
@quocleix
Quoc Le
6 years
We used architecture search to find a better architecture for object detection. Results: Better and faster architectures than Mask-RCNN, FPN and SSD architectures. Architecture also looks unexpected and pretty funky. Link:
Tweet media one
Tweet media two
Tweet media three
9
228
809
@quocleix
Quoc Le
6 years
EfficientDet: a new family of efficient object detectors. It is based on EfficientNet, and many times more efficient than state of art models. Link: Code: coming soon
Tweet media one
8
206
761
@quocleix
Quoc Le
3 years
We can teach a language model to solve complex problems by showing it how to break down a problem into subproblems, and solve them sequentially. This “least-to-most prompting” method solves 99.7% of SCAN benchmark while other prompting methods solve ~16%.
Tweet media one
14
140
739
@quocleix
Quoc Le
1 year
New York Times article on AlphaGeometry. Geometry was my favorite subject in high school because solving them requires many step of reasoning and planning. Geometry problems however have been difficult for AI to solve. Our Nature paper shows my team’s progress in Geometry.
10
80
536
@quocleix
Quoc Le
6 years
Data augmentation is often associated with supervised learning. We find *unsupervised* data augmentation works better. It combines well with transfer learning (e.g. BERT) and improves everything when datasets have a small number of labeled examples. Link:
@lmthang
Thang Luong
6 years
Introducing UDA, our new work on "Unsupervised data augmentation" for semi-supervised learning (SSL) with Qizhe Xie, Zihang Dai, Eduard Hovy, & @quocleix. SOTA results on IMDB (with just 20 labeled examples!), SSL Cifar10 & SVHN (30% error reduction)!
Tweet media one
3
153
605
@quocleix
Quoc Le
4 years
Primer: Searching for Efficient Transformers for Language Modeling. We use evolution to design a new Transformer variant, called Primer. Primer has a better scaling law, and is 3X to 4X faster for training than Transformer for language modeling.
Tweet media one
4
128
603
@quocleix
Quoc Le
2 years
We discovered an optimizer, "Lion", with symbolic program search. In our experiments, it works well for some cases. Please check out for results and limitations.
Tweet media one
13
79
542
@quocleix
Quoc Le
6 years
AdvProp: One weird trick to use adversarial examples to reduce overfitting. Key idea is to use two BatchNorms, one for normal examples and another one for adversarial examples. Significant gains on ImageNet and other test sets.
@tanmingxing
Mingxing Tan
6 years
Can adversarial examples improve image recognition? Check out our recent work: AdvProp, achieving ImageNet top-1 accuracy 85.5% (no extra data) with adversarial examples!. Arxiv: .Checkpoints:
Tweet media one
2
146
529
@quocleix
Quoc Le
5 years
Pretty amazing progress on speech recognition thanks to pre-training and self-training with unlabeled data. Key ingredients: Large conformer architecture + wave2vec2.0 pretraining + Noisy Student Training. Link:
Tweet media one
4
90
514
@quocleix
Quoc Le
6 years
We used architecture search to improve Transformer architecture. Key is to use evolution and seed initial population with Transformer itself. The found architecture, Evolved Transformer, is better and more efficient, especially for small size models. Link:
Tweet media one
Tweet media two
5
107
494
@quocleix
Quoc Le
7 days
Amazing progress in reasoning! 🚀 Gemini 2.5 Pro Deep Think hitting 49.4% on USAMO – a feat I'd have considered impossible just a couple of years ago – & Gemini 2.5 Flash achieving 1424 Elo are huge leaps. So proud our team's research ideas contributed to this moment!.
26
40
545
@quocleix
Quoc Le
6 years
Exciting new work on replacing convolutions with self-attention for vision. Our paper shows that full attention is good, but loses a few percents in accuracy. And a middle ground that combines convolutions and self-attention is better. Link:
Tweet media one
5
102
454
@quocleix
Quoc Le
6 years
We opensourced AutoAugment strategy for object detection. This strategy significantly improves detection models in our benchmarks. Please try it on your problems. Code: Paper: More details & results 👇.
@ekindogus
Ekin Dogus Cubuk
6 years
Data augmentation is even more crucial for detection. We present AutoAugment for object detection, achieving SOTA on COCO validation set (50.7 mAP). Policy transfers to different models & datasets. Paper: Code: details in thread.
Tweet media one
4
144
454
@quocleix
Quoc Le
2 years
Can we speed up language model training with a better data mixture?. Our DoReMi🎶 algorithm optimizes the data mixture, speeding up 8B model training by 2.6x on The Pile. Crucially, DoReMi🎶 just trains a small model (30x smaller) to tune the mixture.
Tweet media one
6
95
424
@quocleix
Quoc Le
6 years
RandAugment was one of the secret sources behind Noisy Student that I tweeted last week. Code for RandAugment is now opensourced.
@barret_zoph
Barret Zoph
6 years
*New paper* RandAugment: a new data augmentation. Better & simpler than AutoAugment. Main idea is to select transformations at random, and tune their magnitude. It achieves 85.0% top-1 on ImageNet. Paper: Code:
Tweet media one
3
77
412
@quocleix
Quoc Le
5 years
New paper: Meta Pseudo Labels. Self-training has a pre-trained teacher to generate pseudo labels to train a student. Here we use the student’s performance to meta-train the teacher to generate better pseudo labels. Works well on ImageNet 10%. Link:
Tweet media one
6
122
408
@quocleix
Quoc Le
6 years
Our work on transformer-xl which uses smart caching to improve the learning of long-term dependency in transformer. Key results: state-of-art on five language modeling benchmarks, including ppl of 21.8 on LM1B and 0.99 on enwiki8. Link:
Tweet media one
7
101
408
@quocleix
Quoc Le
5 years
Last week we released the checkpoints for SOTA ImageNet models trained by NoisyStudent. Due to popular demand, we’ve also opensourced an implementation of NoisyStudent. The code uses SVHN for demonstration purposes. Link: Paper:
Tweet media one
4
116
403
@quocleix
Quoc Le
6 years
We released all checkpoints and training recipes of EfficientNets, including the best model EfficientNet-B7 that achieves accuracy of 84.5% top-1 on ImageNet. Link:
Tweet media one
8
91
399
@quocleix
Quoc Le
3 years
If you like the reasoning abilities in PaLM, also check out:."chain of thought prompting": Which adds “chain of thought” before answers in prompts for language models. This simple idea surprisingly enables LMs to do much better in many reasoning tasks.
Tweet media one
5
71
387
@quocleix
Quoc Le
7 years
Bigger models are better models so we built GPipe to enable training of large models. Results: 84.3% on ImageNet with AmoebaNet (big jump from other state-of-art models) and 99% on CIFAR-10 with transfer learning. Link:
Tweet media one
6
123
365
@quocleix
Quoc Le
6 years
Introducing MobileNetV3: Based on MNASNet, found by architecture search, we applied additional methods to go even further (quantization friendly SqueezeExcite & Swish + NetAdapt + Compact layers). Result: 2x faster and more accurate than MobileNetV2. Link:
Tweet media one
6
123
364
@quocleix
Quoc Le
2 years
Congratulations @geoffreyhinton. The celebration today brought back many fond memories of learning from and collaborating with Geoff. His passion for research is another level and has made a great impact on the career of researchers around him, including myself. Every time he.
@AndrewYNg
Andrew Ng
2 years
Attending ⁦@geoffreyhinton⁩’s retirement celebration at Google with old friends. Thank you for everything you’ve done for AI! ⁦@JeffDean⁩ ⁦@quocleix
Tweet media one
5
12
359
@quocleix
Quoc Le
1 year
Bard (free) with Gemini Pro is now #2 on lmsys chatbot arena, surpassing GPT-4. I use it a lot for email and writing purposes and its responses are much better since the first launch. Give it a try at
@lmarena_ai
lmarena.ai
1 year
🔥Breaking News from Arena. Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement!. The race is heating up like never before! Super excited to see what's next for Bard + Gemini
Tweet media one
14
32
308
@quocleix
Quoc Le
5 years
Cool results from our collaboration with colleagues at @DeepMind on searching for new layers as alternatives for BatchNorm-ReLU. Excited with the potential use of AutoML for discovering novel ML concepts from low level primitives.
@Hanxiao_6
Hanxiao Liu
5 years
New paper: Evolving Normalization-Activation Layers. We use evolution to design new layers called EvoNorms, which outperform BatchNorm-ReLU on many tasks. A promising use of AutoML to discover fundamental ML building blocks. Joint work with @DeepMind
Tweet media one
2
77
309
@quocleix
Quoc Le
10 months
Amazing achievement: Our AI system at Google DeepMind just achieved the equivalent of a silver medal at the International Mathematical Olympiad (#IMO2024). This is a major milestone for AI in mathematics. Very proud to have been a part of it!.
@GoogleDeepMind
Google DeepMind
10 months
We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈. It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵
5
32
275
@quocleix
Quoc Le
6 years
Evolved Transformer is now opensourced in Tensor2Tensor:.
@quocleix
Quoc Le
6 years
We used architecture search to improve Transformer architecture. Key is to use evolution and seed initial population with Transformer itself. The found architecture, Evolved Transformer, is better and more efficient, especially for small size models. Link:
Tweet media one
Tweet media two
3
73
280
@quocleix
Quoc Le
5 years
Waymo's blogpost about the use of automated data augmentation methods for self-driving cars (AutoAugment, RandAugment, Progressive Population Based Augmentation - PPBA). PPBA is ". up to 10 times more data efficient than training nets without augmentation".
@Waymo
Waymo
5 years
Our newest research in collaboration with our @googleAI colleagues will allow us to train better machine learning models with less data and improve perception tasks for the Waymo Driver.
0
58
273
@quocleix
Quoc Le
6 years
Exciting new AutoML lineup for @googlecloud : AutoML Tables & Mobile Vision. Collaboration between Google Brain + AI + Cloud. Some benchmark data are below. Google Brain AutoML team will also participate in a Kaggle competition tomorrow. This will be fun!.
Tweet media one
9
67
254
@quocleix
Quoc Le
4 years
When in doubt, try larger models with more data :-) Joint training of a big model on many speech datasets leads to impressive improvements across the board on many important benchmarks. Link:
Tweet media one
3
42
243
@quocleix
Quoc Le
5 years
Some new progress on Semi-supervised learning for speech recognition. Noisy student training improves accuracy on the LibriSpeech benchmark. The gains are especially big on the noisy test set (“Test-other”). Link:
Tweet media one
1
56
245
@quocleix
Quoc Le
3 years
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts.1.2T weight model that has better average zero and one-shot results than GPT-3 while using only ⅓ of compute for training. Arxiv: Blog:
Tweet media one
6
59
236
@quocleix
Quoc Le
7 years
We studied transfer learning from ImageNet to other datasets. Finding: better ImageNet architectures tend to work better on other datasets too. Surprise: pretraining on ImageNet dataset sometimes doesn't help very much. More info:
@skornblith
Simon Kornblith
7 years
We updated our paper "Do Better ImageNet Models Transfer Better?" In v2, we show that regularization settings for ImageNet training matter a lot for transfer learning on fixed features. ImageNet accuracy now correlates with transfer acc in all settings.
3
55
228
@quocleix
Quoc Le
4 years
Good meme for recent MLP papers ;).Pay attention to MLPs:
@guillefix
GuilleFix — k/acc | /dd ⏩
4 years
Glad to see machine learning finally growing up
Tweet media one
6
38
204
@quocleix
Quoc Le
2 months
@JeffDean Are you ReLU sure that's the best way to calculate it? Sounds taxing!.
3
9
207
@quocleix
Quoc Le
6 years
Method is also super simple:. 1) Train a classifier on ImageNet.2) Infer labels on a much larger unlabeled dataset.3) Train a larger classifier on the combined set.4) Iterate the process, adding noise.
12
30
196
@quocleix
Quoc Le
4 years
Large language models can perform zero-shot learning pretty well if you finetune them on NLP tasks verbalized with instructions. Link:
Tweet media one
4
42
195
@quocleix
Quoc Le
2 months
Gemini 2.5 Pro is now #1 across all categories on Arena leaderboard. Been playing with it and it's a really amazing model.
@lmarena_ai
lmarena.ai
2 months
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆. Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer
Tweet media one
7
14
189
@quocleix
Quoc Le
5 years
Nice work on speeding up physics simulations with convolutional nets & neural architecture search.
2
30
161
@quocleix
Quoc Le
6 years
Peters et al 2017 and 2018 (ELMo) get the credit for the discoveries of 1) Using language models for transfer learning and 2) Embedding words through a language model ("contextualized word vectors"). ELMo is great, but these methods were proposed earlier. (1/3).
2
29
162
@quocleix
Quoc Le
6 years
Wanted to apply AutoAugment to speech, but a handcrafted augmentation policy already improves SOTA. Idea: randomly drop out certain time & frequency blocks, and warp input spectrogram. Results: state-of-art on LibriSpeech 960h & Switchboard 300h. Link:
@GoogleAI
Google AI
6 years
Automatic Speech Recognition (ASR) struggles in the absence of an extensive volume of training data. We present SpecAugment, a new approach to augmenting audio data that treats it as a visual problem rather than an audio one. Learn more at →
Tweet media one
0
37
151
@quocleix
Quoc Le
6 years
I also highly recommend this nice video that explains the paper very well:.
4
30
149
@quocleix
Quoc Le
5 years
@xpearhead @lmthang My favorite conversation is below. The Hayvard pun was funny but I totally missed the steer joke at the end until it was pointed out today by @Blonkhart
Tweet media one
5
20
142
@quocleix
Quoc Le
6 years
Key idea in these two papers is to ensure prediction(x) = prediction(x + noise) , where x is an unlabeled example. People have tried all kind of noise, e.g., Gaussian noise, adversarial noise etc. But it looks like data augmentation noise is the real winner.
Tweet media one
3
32
143
@quocleix
Quoc Le
6 years
For me, ELMo is my favorite work of 2018. Together with ULMFit, CoVE, GPT, BERT and others, they represent a breakthrough. I just mean that these ideas have a history, and want to bring up some history from my part. Also check out some comments in the thread for related works.
@quocleix
Quoc Le
6 years
Peters et al 2017 and 2018 (ELMo) get the credit for the discoveries of 1) Using language models for transfer learning and 2) Embedding words through a language model ("contextualized word vectors"). ELMo is great, but these methods were proposed earlier. (1/3).
0
15
136
@quocleix
Quoc Le
1 year
Our new work on evaluating and benchmarking long-form factuality. We provide a new dataset, an evaluation method, an aggregation metric that accounts for both precision and recall, and an analysis of thirteen popular LLMs (including Gemini, GPT, and Claude). We’re also.
@JerryWeiAI
Jerry Wei
1 year
New @GoogleDeepMind+@Stanford paper! 📜. How can we benchmark long-form factuality in language models?. We show that LLMs can generate a large dataset and are better annotators than humans, and we use this to rank Gemini, GPT, Claude, and PaLM-2 models.
Tweet media one
2
20
137
@quocleix
Quoc Le
7 years
Our latest paper on DropBlock, a new regularization method for convolutional networks. DropBlock drops continuous region of a feature map. It improves ImageNet top1 of ResNet from 76.5% to 78.1%, and COCO mAP of RetinaNet from 36.8% to 38.4%. Paper:
Tweet media one
Tweet media two
2
53
136
@quocleix
Quoc Le
10 months
You should spend more compute at test time.
@jordanjuravsky
Jordan Juravsky
10 months
Do you like LLMs? Do you also like for loops? Then you’ll love our new paper!. We scale inference compute through repeated sampling: we let models make hundreds or thousands of attempts when solving a problem, rather than just one. By simply sampling more, we can boost LLM
Tweet media one
2
9
109
@quocleix
Quoc Le
4 years
Short summary of the method: Meta Pseudo Labels is based on Pseudo Labels. But it has a feedback loop from student to teacher. So both teacher and student are trained concurrently to teach each other in this method. Code:
Tweet media one
4
18
123
@quocleix
Quoc Le
6 years
Since our work on "Semi-supervised sequence learning", ELMo, BERT and others have shown changes in the algorithm give big accuracy gains. But now given these nice results with a vanilla language model, it's possible that a big factor for gains can come from scale. Exciting!.
@OpenAI
OpenAI
6 years
We've trained an unsupervised language model that can generate coherent paragraphs and perform rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training:
0
17
119
@quocleix
Quoc Le
2 years
Cool new research from our team @Google!. Symbol tuning is a new finetuning method that allows language models to better learn input–label mappings in-context.
@JerryWeiAI
Jerry Wei
2 years
New @GoogleAI+@Stanford paper!📜. Symbol tuning is a simple method that improves in-context learning by emphasizing input–label mappings. It improves robustness to prompts without instructions/relevant labels and boosts performance on algorithmic tasks.
Tweet media one
2
12
100
@quocleix
Quoc Le
5 years
@xpearhead @lmthang You can find some sample conversations with the bot here:.
2
17
103
@quocleix
Quoc Le
6 years
Introducing Natural Questions, a new dataset for Q&A research. Many questions in the dataset are not yet answered well by Google:. where does the energy in a nuclear explosion come from?. when are hops added to the brewing process?. The dataset is still difficult for SoTA models.
@GoogleAI
Google AI
6 years
Introducing Natural Questions, a new, large-scale corpus and challenge for training and evaluating open-domain question answering systems, and the first to replicate the end-to-end process in which people find answers to questions. Learn more at ↓
0
20
105
@quocleix
Quoc Le
6 years
Full comparison against state-of-the-art on ImageNet. Noisy Student is our method. Noisy Student + EfficientNet is 11% better than your favorite ResNet-50 😉
Tweet media one
5
18
99
@quocleix
Quoc Le
5 years
A nice video about the paper is below. Note that first version of the paper said 87.4% top-1 accuracy but since then we improved the result to 88.4%.
1
17
99
@quocleix
Quoc Le
7 years
Introducing MAPO, an policy gradient method augmented with memory of good trajectories to make better updates. Well suited for generating programs to query databases. Good accuracies on WikiTableQuestions and WikiSQL. Video: #NeurIPS2018
Tweet media one
1
27
97
@quocleix
Quoc Le
6 years
Update from #KaggleDays , 5 hours into the competition and Google AutoML still maintains its lead. Three hours to go (five hours since I took the pictures).
Tweet media one
Tweet media two
4
11
94
@quocleix
Quoc Le
6 years
Google AI's blog post about our work on Transformer-XL.
@GoogleAI
Google AI
6 years
Introducing Transformer-XL, a novel architecture that enables natural language understanding beyond a fixed-length context. You can learn more and find the paper, code, pretrained models and hyperparameters at ↓
1
19
91
@quocleix
Quoc Le
1 year
@ylecun @JeffDean @deliprao @karpathy @AndrewYNg DistBelief had backprop from day one. The "cat detector" work you referred to did use backprop when we finetuned on ImageNet. Screenshot from Section 6 of the paper:
Tweet media one
2
6
93
@quocleix
Quoc Le
10 months
Excited to see Gemini #1 on lmsys. Nice ELO of 1300 :-).
@lmarena_ai
lmarena.ai
10 months
Exciting News from Chatbot Arena!. @GoogleDeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive
Tweet media one
1
2
81
@quocleix
Quoc Le
6 years
Final results are in, and Google AutoML came in second in the final ranking.
Tweet media one
4
6
89
@quocleix
Quoc Le
6 years
As a data augmentation method, adversarial examples are more general than other image processing techniques. So I expect AdvProp to be useful everywhere (language, structured data etc.), not just image recognition.
1
14
88
@quocleix
Quoc Le
5 years
Nice summary video about AutoML-Zero by @CShorten30:.
2
29
84
@quocleix
Quoc Le
1 month
AlphaPokemon? 😅.
@OfficialLoganK
Logan Kilpatrick
1 month
Gemini 2.5 Pro just got the final 8th badge in Pokemon Blue, incredible pace of progress by the world's most powerful model!!!. Next up: victory road and final 4 : )
Tweet media one
3
3
87
@quocleix
Quoc Le
6 years
Blog post about our collaboration to automate the design of machine learning models for @Waymo 's perception problems.
@Waymo
Waymo
6 years
Our researchers have teamed up with @GoogleAI to put cutting-edge AutoML research into practice by automatically generating neural nets for our self-driving cars. The results? Faster and more accurate nets for our vehicles.
0
14
84
@quocleix
Quoc Le
7 years
Auto-Augment policies are now opensourced in TF-Hub.
@TensorFlow
TensorFlow
7 years
Image augmentation lets you get the most of your dataset. We released state-of-the-art AutoAugment Modules on #TFHub allowing you to train better image models with less data! #TensorFlowHub #GoogleAI. Check it out here →
Tweet media one
Tweet media two
0
15
83
@quocleix
Quoc Le
6 years
Links to the mentioned papers. MixMatch: Unsupervised Data Augmentation:
1
14
72
@quocleix
Quoc Le
4 years
@Miles_Brundage @emilymbender @timnitGebru Oops, thanks for pointing out. We had some text discussing Bender et al. in an earlier draft, but it looks like an editing pass to rework some text from a few weeks ago accidentally dropped that. We'll upload a new version soon with that discussion restored.
9
7
70
@quocleix
Quoc Le
1 year
@deliprao We congratulate Tomas on winning the award. Regarding seq2seq, there are inaccuracies in his account. In particular, we all recall very specifically that he did not suggest the idea to us, and was in fact highly skeptical when we shared the end-to-end translation idea with him.
4
1
73
@quocleix
Quoc Le
6 years
First, the idea of using pre-trained language models for transfer learning was proposed in our paper in 2015: . (2/3)
Tweet media one
4
7
70
@quocleix
Quoc Le
2 years
Asking LLMs to take a step back improves their ability to reason and answer highly specific questions.
@Swarooprm7
Swaroop Mishra
2 years
Introducing 🔥 ‘Step back prompting’ 🔥 fueled by the power of abstraction. Joint work with the awesome collaborators at @GoogleDeepMind : @HuaixiuZheng , @xinyun_chen_ , @HengTze , @edchi , @quocleix , @denny_zhou. LLMs struggle to answer specific questions such as: “Estella
Tweet media one
1
14
66
@quocleix
Quoc Le
1 year
@JeffDean @ylecun Thank you, @JeffDean, for the summary. I also want to add that the paper influences many of my subsequent works on unsupervised learning for language (such as . It also influences the "sentiment neuron" paper at OpenAI (probably should be called GPT-0?):.
1
5
69
@quocleix
Quoc Le
6 years
Architecture of EfficientDet
Tweet media one
2
7
71
@quocleix
Quoc Le
5 years
This work continues our efforts on semi-supervised learning such as. UDA: MixMatch: FixMatch: Noisy Student: etc. Joint work with @hieupham789 @QizheXie @ZihangDai.
1
11
69
@quocleix
Quoc Le
5 years
@hardmaru We have a few data points that suggest such improvements are meaningful:. 1. Better ImageNet models transfer better to other datasets: 2. Better accuracy on ImageNet gives vast improvements in out-of-distro generalization:
3
16
65
@quocleix
Quoc Le
6 years
Second, the idea of embedding words through a language model was proposed in our followup paper in 2016: . (3/3)
Tweet media one
3
5
61
@quocleix
Quoc Le
6 years
To add to Vincent's point above, new findings also include:. 1. The method is general (works well for images & texts). 2. The method works well on top of transfer learning (e.g., BERT). You can find these results in Unsupervised Data Augmentation paper:
0
11
63
@quocleix
Quoc Le
5 years
Applying this simple method to EfficientNet produces much more robust models on ImageNet than previous state-of-the-art.
Tweet media one
2
3
60
@quocleix
Quoc Le
3 years
For PaLM, chain of thought prompting boosted the solving rate on GSM8K from 17.9% (by regular prompting) to 58.1%. Quite a big jump!
Tweet media one
3
11
54
@quocleix
Quoc Le
1 year
@ilyasut @sama @gdb @miramurati All the best on your new adventures, @ilyasut!.
1
0
58
@quocleix
Quoc Le
6 years
Many of us tried to use adversarial examples as data augmentation and observed a drop in accuracy. And it seems that simply using two BatchNorms overcomes this mysterious drop in accuracy.
1
8
52
@quocleix
Quoc Le
5 years
This work continues our self-training efforts:.Noisy Student Training (SOTA on ImageNet): Noisy Student Training for Speech (SOTA on LibriSpeech): Conclusion? Good results on big datasets need self-training (w/ Noisy Student) :).
3
7
52
@quocleix
Quoc Le
3 years
Summary tweet of other contributions by the first author:
@hwchung27
Hyung Won Chung
3 years
New paper + models!. We extend instruction finetuning by.1. scaling to 540B model.2. scaling to 1.8K finetuning tasks.3. finetuning on chain-of-thought (CoT) data. With these, our Flan-PaLM model achieves a new SoTA of 75.2% on MMLU.
Tweet media one
1
5
51
@quocleix
Quoc Le
2 years
A weakness of LLMs is that they don’t know recent events well. This is nice work from Tu developing a benchmark (FreshQA) to measure factuality of recent events, and a simple method to improve search integration for better performance on the benchmark.
@tuvllms
Tu Vu
2 years
🚨 New @GoogleAI paper:. 🤖 LLMs are game-changers, but can they help us navigate a constantly changing world? 🤔. As of now, our work shows that LLMs, no matter their size, struggle when it comes to fast-changing knowledge & false premises. 📰: 👇
Tweet media one
Tweet media two
1
7
48
@quocleix
Quoc Le
5 years
@xpearhead @lmthang @Blonkhart I had another conversation with Meena just now. It's not as funny and I don't understand the first answer. But the replies to the next two questions are quite funny.
Tweet media one
1
9
47
@quocleix
Quoc Le
3 years
Chain of thought prompting combined with self-consistency decoding ( ) solves 75% of the math problems in the GSM8K benchmark (prior finetuned SOTA = 55%).
Tweet media one
2
10
50
@quocleix
Quoc Le
5 years
Our work also reveals that bigger models are also more robust models, which is unexpected and also good news. It's good news because more compute can be used to tackle adversarial robustness.
Tweet media one
3
6
46
@quocleix
Quoc Le
5 years
Our implementation of EfficientDet in TensorFlow:. Good implementation of EfficientDet in PyTorch (not by us):.
Tweet media one
0
7
47
@quocleix
Quoc Le
10 months
@JeffDean Congratulations, Jeff! A special journey with so many foundational contributions to Google and computer science.
0
0
47