Dustin Tran @dustinvtran profile

Dustin Tran

@dustinvtran

Followers

42K

Following

4K

Media

201

Statuses

3K

Research Scientist at Google DeepMind. I lead evaluation at Gemini / Bard.

Joined June 2013

Don't wanna be here? Send us removal request.

Dustin Tran

@dustinvtran

2 years

We are actively hiring in the Bard research team (@quocleix @HengTze ). The team's leadership continues to be transparent and laser-focused.

8

32

282

Dustin Tran

@dustinvtran

2 years

For those keeping track

Dustin Tran

@dustinvtran

2 years

2023 has already seen more advances in AI than any other year. This velocity will only increase.

24

273

1K

Dustin Tran

@dustinvtran

3 years

My favorite oxymoron in machine learning: "empirically proven".

28

71

819

Dustin Tran

@dustinvtran

4 years

I'm so appreciative that ML is at a state where open-source code, freely available conference videos/proceedings, and now even open reviews are becoming the norm. That's not a luxury all research fields have.

12

64

766

Dustin Tran

@dustinvtran

7 years

"Simple, Distributed, and Accelerated Probabilistic Programming". The #NIPS2018 paper for Edward2. Scaling probabilistic programs to 512 TPUv2 cores and 100+ million parameter models.

8

208

724

Dustin Tran

@dustinvtran

8 years

Videos are now available for the 2017 Deep Learning (and RL) Summer Schools in Montreal

2

291

677

Dustin Tran

@dustinvtran

2 years

Recap of the trendiest conversation topic at NeurIPS in the past 7 years. 2015: Bayes / RL / OpenAI.2016: Deep RL & Autoregressive generative models.2017: "DL is alchemy".2018: Glow & Neural ODEs.2019: Understanding DL.2020: GPT-3.2021: people had conversations. ?.2022: ChatGPT.

9

57

564

Dustin Tran

@dustinvtran

7 years

Starting today, I am at Google full-time as a Research Scientist. See everyone in the Bay Area!.

33

11

556

Dustin Tran

@dustinvtran

4 years

Speaking of interesting facts from analyzing conferences: Google is 3-10x the size of any other AI research lab. That's not even including DeepMind which ranks 3rd.

16

89

528

Dustin Tran

@dustinvtran

4 years

What gripes do you have with LaTeX's default, and what you always add to papers? Here are mine: 1. Cleveref. Don't use "Section \ref{sec:intro}". Use \Cref{sec:intro}. This makes writing less error prone and it makes "Section" part of the hyperlink!

5

93

517

Dustin Tran

@dustinvtran

7 years

Excited to introduce TensorFlow Probability. Official tools for probabilistic reasoning and statistical analysis in the TF ecosystem.

TensorFlow

@TensorFlow

7 years

Introducing TensorFlow Probability: empowering ML researchers and practitioners to build sophisticated models quickly, leveraging state-of-the-art hardware . Read about it on the TensorFlow blog ↓.

6

154

454

Dustin Tran

@dustinvtran

3 years

It’s surprising that in 2022, there remains little movement away from LaTeX toward a new language. It has some of the most unintuitive designs and syntax you’d expect in a language today.

53

23

433

Dustin Tran

@dustinvtran

5 years

How I spent this weekend: upgrading my battlestation.

18

5

424

Dustin Tran

@dustinvtran

5 years

Excited to release rank-1 Bayesian neural nets, achieving new SOTA on uncertainty & robustness across ImageNet, CIFAR-10/100, and MIMIC. We do extensive ablations to disentangle BNN choices.@dusenberrymw @Ghassen_ML @JasperSnoek @kat_heller @balajiln et al

3

88

406

Dustin Tran

@dustinvtran

6 years

I got into the field by watching The MLSS videos are all excellent (my favorite is Cambridge 2009 . For a deeper dive, you should dive into textbooks. I recommend Bayesian Data Analysis and ML: A Probabilistic Perspective.

6

67

410

Dustin Tran

@dustinvtran

3 years

0 papers at NeurIPS and suddenly 12 publications at NeurIPS 2021🤔This may be the most pervasive cheating scandal I've seen in academia.

Leon Derczynski ✍🏻 🍂🍏

@LeonDerczynski

3 years

machine learning researchers learn to optimise their own best paper rate through collusion and other unregulated mechanisms.

15

50

395

Dustin Tran

@dustinvtran

6 years

Interesting in quickly experimenting with BNNs, GPs, and flows? Check out Bayesian Layers, a simple layer API for designing and scaling up architectures. #NeurIPS2018 Bayesian Deep Learning, Happening now.

11

99

396

Dustin Tran

@dustinvtran

8 years

Syllabus for my qualifying exam. It involves 29 papers representing the state of the art in Bayesian deep learning

13

105

404

Dustin Tran

@dustinvtran

8 years

Excited to be at Google for rest of this year. Aside from basic ML research, expect Edward officially merging into @tensorflow (contrib).

12

64

394

Dustin Tran

@dustinvtran

3 years

Just an appreciation tweet for the normalization of arXiv and freely available conference papers in CS. I'm trying to read science papers in Cog Sci and Psychology, and it's a nightmare to access.

12

19

384

Dustin Tran

@dustinvtran

4 years

Tomorrow @latentjasper @balajiln and I present a #NeurIPS2020 tutorial on "Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning". Whether you're new to the area or an expert, there is critically useful info! 8-10:30a PT

8

48

365

Dustin Tran

@dustinvtran

3 years

In our work "Plex", we propose a framework for reliability in AI. We also introduce new models (ViT-Plex & T5-Plex) for reliable decision-making across a broad array of scenarios. Blog: Paper: Code:

10

71

362

Dustin Tran

@dustinvtran

4 years

It's 2021, and we're still debugging functions by manually checking Tensor/np.ndarray shapes. Why aren't type systems for array dimensions a common standard yet?.

21

18

333

Dustin Tran

@dustinvtran

8 years

I'm awarded a Google PhD fellowship in Machine Learning, for work in Bayesian deep learning. Thanks @googleresearch!

21

25

328

Dustin Tran

@dustinvtran

7 years

Think in function space, not parameter space. @yeewhye's talk on Bayesian deep learning at #NIPS2017

3

87

310

Dustin Tran

@dustinvtran

8 years

Rajesh Ranganath, Dave Blei and I released "Deep and Hierarchical Implicit Models" on arXiv

2

132

313

Dustin Tran

@dustinvtran

6 years

daily reminder about jupyter notebooks

9

42

298

Dustin Tran

@dustinvtran

3 years

Wow, they finally did it. You can now render LaTeX equations in Markdown. all with MathJax under the hood

5

45

298

Dustin Tran

@dustinvtran

3 years

One thing I find fascinating is that Parti is another data point suggesting that the key to large models is not diffusion, GANs, contrastive training, autoregressivity, or other more complex methods. What matters most is scale.

Jeff Dean

@JeffDean

3 years

"A photo of the back of a wombat wearing a backpack and holding a walking stick. It is next to a waterfall and is staring at a distant mountain." #parti.

9

23

298

Dustin Tran

@dustinvtran

7 years

How do we specify priors for Bayesian neural networks? Check out our work on Noise Contrastive Priors at the ICML Deep Generative Models workshop 11:40am+. @danijarh, @alexirpan, Timothy Lillicrap, James Davidson

4

78

286

Dustin Tran

@dustinvtran

8 years

"A Research to Engineering Workflow". An outline of how I personally learn and do basic research.

3

78

282

Dustin Tran

@dustinvtran

8 years

Excited to be joining @OpenAI today. (I am on leave from Columbia for the rest of this year.) also: shout out to those in the Bay Area!.

19

13

281

Dustin Tran

@dustinvtran

4 years

Snippet 1 from the #NeurIPS2020 tutorial: @balajiln What do we mean by uncertainty and out-of-distribution robustness?

3

68

276

Dustin Tran

@dustinvtran

6 years

Talks for the Probabilistic Programming conference #PROBPROG2018 are now available! Includes, e.g., Zoubin Ghahramani @djsyclik Dave Blei @roydanroy @tom_rainforth @migorinova Stuart Russell, Josh Tenenbaum, and many others.

1

78

272

Dustin Tran

@dustinvtran

2 years

Yann is wrong that the issue is in autoregressive generation. In fact, you can make an autoregressive model generate a full sequence and refine through inverse CDF-like tricks. The result is exactly the same. 1/3.

Yann LeCun

@ylecun

2 years

I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes. Here is the argument:.Let e be the probability that any generated token exits the tree of "correct" answers. Then the probability that an answer of length n is correct is (1-e)^n.1/

11

23

262

Dustin Tran

@dustinvtran

7 years

Highly recommend this augmentation, infrastructure, and human-centric perspective on "AI" by Mike Jordan

6

68

254

Dustin Tran

@dustinvtran

2 years

Official Bard announcement! Team has been hard at work (myself humbly included). Excited to release and share more details soon.

Sundar Pichai

@sundarpichai

2 years

1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational #GoogleAI service powered by LaMDA.

12

8

248

Dustin Tran

@dustinvtran

6 years

Check out Discrete Flows, a simple way to build flexible discrete disctributions. @keyonV @kumarkagrawal @poolio @laurent_dinh Our poster's at the Generative Models for Structured Data workshop! Room R02, today 3:15p #iclr2019

2

46

244

Dustin Tran

@dustinvtran

7 years

PyMC4 announces to base on @TensorFlow and TensorFlow Probability. This is exciting news for consolidating open source efforts for machine learning!

Thomas Wiecki

@twiecki

7 years

Big announcement on #PyMC4 (it will be based on #TensorFlow probability) as well as #PyMC3 (we will take over #Theano maintenance)

1

67

233

Dustin Tran

@dustinvtran

3 years

I'm against GPT-4chan's unrestricted deployment. However, a condemnation letter against a single independent researcher smells of unnecessary pitchfork behavior. Surely there are more civil and actionable approaches. I'd love to hear what steps were taken leading up to this.

Percy Liang

@percyliang

3 years

There are legitimate and scientifically valuable reasons to train a language model on toxic text, but the deployment of GPT-4chan lacks them. AI researchers: please look at this statement and see what you think:

15

9

228

Dustin Tran

@dustinvtran

5 years

With all the new ML frameworks lately and the short lifespan of old ones, I sometimes wonder if we'd all better if we had just stuck with Theano.

15

217

Dustin Tran

@dustinvtran

8 years

“Deep Probabilistic Programming” now on arXiv. Foundations of with deep learning apps

3

107

223

Dustin Tran

@dustinvtran

7 years

"The shift from AI being a research domain to it increasingly becoming a research + engineering domain, is a strong signal that we're not in a bubble this time." +100. Systems research advances science and is immediately useful :)

3

68

211

Dustin Tran

@dustinvtran

8 years

TensorFlow Distributions. For researchers: learn about all its features, PPL: learn how DL apps leads to new designs

1

71

201

Dustin Tran

@dustinvtran

4 years

There are three types of researchers: 1. those that only look at the paper's methods and ideas; 2. those that only look at experiments; and 3. those that no longer read papers.

7

10

199

Dustin Tran

@dustinvtran

6 years

"Bayesian Layers: A Module for Neural Network Uncertainty" on arXiv: With @dusenberrymw, @markvanderwilk, @danijarh.

Dustin Tran

@dustinvtran

6 years

Interesting in quickly experimenting with BNNs, GPs, and flows? Check out Bayesian Layers, a simple layer API for designing and scaling up architectures. #NeurIPS2018 Bayesian Deep Learning, Happening now.

2

61

207

Dustin Tran

@dustinvtran

5 years

I love how LaTeX font size names are so arbitrary. small, normalsize, large, Large, LARGE.

13

4

196

Dustin Tran

@dustinvtran

6 years

See the arXiv version at It includes character-level results (1.38 bpc on PTB; 1.23 bpc on text8) with a RealNVP-like flow model that's 100-1000x faster at generation than state-of-the-art autoregressive baselines.

Dustin Tran

@dustinvtran

6 years

Check out Discrete Flows, a simple way to build flexible discrete disctributions. @keyonV @kumarkagrawal @poolio @laurent_dinh Our poster's at the Generative Models for Structured Data workshop! Room R02, today 3:15p #iclr2019

4

60

194

Dustin Tran

@dustinvtran

5 years

Thanks for the opportunity to give a tutorial on "Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning"! With @latentjasper and @balajiln. Look forward to it. :-).

Danielle Belgrave

@DaniCMBelg

5 years

Thanks to everyone who submitted tutorial proposals for this year's @NeurIPSConf. It was amazing to see all the great work that went in to putting together these proposals. We are happy to announce this year's tutorials at #neurips2020

3

16

193

Dustin Tran

@dustinvtran

8 years

Dave Blei and I are excited to share “Implicit causal models for genome-wide association studies”

8

61

189

Dustin Tran

@dustinvtran

7 years

Presenting "Why Aren't You Using Probabilistic Programming?" tomorrow. 8:05-8:30 at Hall C #NIPS2017

5

44

190

Dustin Tran

@dustinvtran

4 years

Every time I use matplotlib, I'm confused whether to use set_xticks, set_xticks_labels, or xticks. Why pyplot and the object-oriented API are inconsistent, and why matplotlib even supports two ways to do the same thing, is beyond me.

17

2

186

Dustin Tran

@dustinvtran

7 years

"Google and Others Are Building AI Systems That Doubt Themselves" by @willknight Edward, Pyro, and prob programming

6

70

182

Dustin Tran

@dustinvtran

7 years

Interested in causal models? Check out our work applying it to genomics. #ICLR2018 4:30-6:30p today @ East Meeting (#9). With Dave Blei @blei_lab

5

49

185

Dustin Tran

@dustinvtran

3 years

Interesting to observe that over the years, the cost and people involved to make AI research papers is probably closer to that of a film production than a novel.

6

12

181

Dustin Tran

@dustinvtran

8 years

TensorFlow v1.3.0, including first official release of the `tf.distributions` library

6

62

184

Dustin Tran

@dustinvtran

7 years

Videos available for #NIPS2017 Workshop on Approximate Inference

2

57

177

Dustin Tran

@dustinvtran

8 years

was recomm. to me by @ChrLouizos. It explains generalization _very_ well from persp. of information & probability.

0

54

179

Dustin Tran

@dustinvtran

8 years

Great response to "Why Probability Theory Should be Thrown Under the Bus"by @tdietterich (can read indep of article)

4

50

174

Dustin Tran

@dustinvtran

5 years

Not so controversial take: I don’t think anyone seriously believes doing exact Bayes on a network tuned for SGD and not tuning the prior (or likelihood) will be actually lead to better predictive results.

4

25

168

Dustin Tran

@dustinvtran

4 years

Check out the @Entropy_MDPI Special Issue on Probabilistic Methods for Deep Learning. w/ @eric_nalisnick. Submission deadline: October 1. It's a timely venue if you're looking to publish, say, a summer research project or a previous conference submission.

2

23

166

Dustin Tran

@dustinvtran

8 years

Theano stops development. Thanks for all the amazing, innovative work!

0

100

165

Dustin Tran

@dustinvtran

3 years

Spicy take against their take: This is the classic resistance whenever there's a paradigm shift in the field. The air in the room is the same as in 2013-14 when deep learning was on the rise.

Leon Derczynski ✍🏻 🍂🍏

@LeonDerczynski

3 years

'I don't really trust papers out of "Top Labs" anymore' .

9

6

170

Dustin Tran

@dustinvtran

5 years

Reviewers have fragile egos. Successful rebuttals are not only about who’s right but also about diplomacy.

6

5

164

Dustin Tran

@dustinvtran

7 years

"Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution" by Judea Pearl

3

71

162

Dustin Tran

@dustinvtran

3 years

@docmilanfar I plugged it into YOLOv3. not as bad as I thought

5

6

160

Dustin Tran

@dustinvtran

8 years

Why probabilistic generative models? A great, concise description. (Found by stalking @DavidDuvenaud's courses. :)

2

81

155

Dustin Tran

@dustinvtran

8 years

Come check out our poster on "Deep Probabilistic Programming" today at 10:30-12:30p, C3. #ICLR2017

2

42

154

Dustin Tran

@dustinvtran

8 years

Thanks @lawrennd for inviting me to GPSS! My slides on "Probabilistic Programming with GPs":

3

36

148

Dustin Tran

@dustinvtran

7 years

The Chambers Statistical Software Award at #JSM2018 was graciously given to Edward. Check out the talk on Monday (10:30am). Will also be at the evening mixer—reach out if you're around!

9

18

151

Dustin Tran

@dustinvtran

7 years

Yordan Zaykov at #PROBPROG: "is now open-source." That's incredible news for the probabilistic programming community!

1

51

144

Dustin Tran

@dustinvtran

8 years

Finally had time to watch this insightful talk on "AI impact on jobs" by Michael Osborne. Highly recommend it.

3

28

143

Dustin Tran

@dustinvtran

7 years

Tutorial on “Deep Probabilistic Programming: TensorFlow Distributions and Edward” w/ Rif Saurous, 2pm #POPL2018

2

35

141

Dustin Tran

@dustinvtran

8 years

"Deep Probabilistic Programming", at @iclr2017. And a companion webpage to follow the code

1

51

140

Dustin Tran

@dustinvtran

8 years

“The Algorithms Behind Probabilistic Programming.” Great description of Bayesian inf, NUTS, ADVI by @FastForwardLabs

1

65

138

Dustin Tran

@dustinvtran

5 years

Check out our work analyzing (non)autoregressive models for NMT! Nonautoregressive latent variable models can achieve higher likelihood (lower perplexity) than autoregressive models.

Jason Lee

@jaseleephd

5 years

“On the Discrepancy between Density Estimation and Sequence Generation”. Seq2seq models are optimized w.r.t log-likelihood. We investigate the correlation btw. LL and generation quality on machine translation. w/ @dustinvtran, @orf_bnw, @kchonyc. (1/n).

2

17

133

Dustin Tran

@dustinvtran

5 years

Check out our Hyperparameter Ensembles, #NeurIPS2020 camera-ready at Hyper-deep ensembles expand on random init diversity by integrating over a larger space of hparams. Hyper-batch ensembles expand on efficient methods. @flwenz @RJenatton @latentjasper

4

29

136

Dustin Tran

@dustinvtran

8 years

Edward, now with Jupyter notebooks

1

43

134

Dustin Tran

@dustinvtran

8 years

67 accepted papers at #NIPS2017 Approximate Inference workshop. Titles at Thanks to PCs!

4

41

134

Dustin Tran

@dustinvtran

5 years

I agree with this post about the TensorFlow user experience. Here's my response regarding the unfortunate lack of success in specifically getting Bayesian neural networks to work.

5

10

130

Dustin Tran

@dustinvtran

8 years

Our ADVI journal paper has been published. Available in Stan and PyMC3; partially in Edward and WebPPL.

0

39

121

Dustin Tran

@dustinvtran

3 years

@AlexGDimakis I actually argue one of deep learning's biggest flaws is the lack of modularity. Imagine a system where changing one component affects every other component. This is "end-to-end learning": an engineering nightmare of leaky abstractions that we're somehow OK with in modern ML.

7

9

116

Dustin Tran

@dustinvtran

5 years

We released the ICLR paper! BatchEnsemble includes SOTA on efficient lifelong learning across splitCIFAR and splitImageNet, improved accuracy+uncertainty across CIFAR and contextual bandits, WMT, and diversity analysis. Lead by Yeming Wen and w/ Jimmy Ba.

Dustin Tran

@dustinvtran

5 years

Check out BatchEnsemble: Efficient Ensembling with Rank 1 Perturbations at the #NeurIPS2019 Bayesian DL workshop. Better accuracies and uncertainty than dropout and competitive with ensembles across a wide range of tasks. 1/-

4

23

123

Dustin Tran

@dustinvtran

8 years

Excellent recorded talks from Cognitive Computional Neuroscience 2017. (thanks @skrish_13 for pointing me to it)

2

27

119

Dustin Tran

@dustinvtran

6 months

The team says hi again.

lmarena.ai

@lmarena_ai

6 months

Woah, huge news again from Chatbot Arena🔥. @GoogleDeepMind’s just released Gemini (Exp 1121) is back stronger (+20 points), tied #1🏅Overall with the latest GPT-4o-1120 in Arena!. Ranking gains since Gemini-Exp-1114:. - Overall #3 → #1.- Overall (StyleCtrl): #5 -> #2.- Hard

8

4

123

Dustin Tran

@dustinvtran

7 years

A key idea for merging probabilistic programming with deep learning: Conditional independence from Vikash Mansinghka #NIPS2017 tutorial

2

20

116

Dustin Tran

@dustinvtran

8 years

My comments on "Zhusuan: A Library for Bayesian Deep Learning"

2

31

120

Dustin Tran

@dustinvtran

7 years

Check out the Image Transformer by @nikiparmar09 @ashVaswani others and me. Talk at 3:20p @ Victoria (Deep Learning). Visit our poster at 6:15-9:00p @ Hall B #217!

0

31

119

Dustin Tran

@dustinvtran

8 years

Interesting work from UberAI labs. Probabilistic programming in PyTorch led by Noah Goodman, Eli Bingham, and others.

Theofanis Karaletsos

@Tkaraletsos

8 years

Like Bayesian Inference and #pytorch? Try our PPL, Pyro.

3

35

118

Dustin Tran

@dustinvtran

8 years

I think the more we learn about neural nets, the more statements such as "uninterpretable", "data inefficient", or "noncausal" are wrong.

7

27

116

Dustin Tran

@dustinvtran

8 years

TensorFlow Dataset API (. A cleaner and streamlined input pipeline

0

37

116

Dustin Tran

@dustinvtran

8 years

Interesting paper. "Variational Gaussian Dropout is not Bayesian" by Jiri Hron, @alexggmatthews, Zoubin Ghahramani

4

31

112

Dustin Tran

@dustinvtran

8 years

Recorded talks & panels are available for the NIPS 2016 Workshop on Advances in Approximate Bayesian Inference

2

45

113

Dustin Tran

@dustinvtran

7 years

Thanks @yaringal @andrewgwils @ChrLouizos for organizing! Slides available at

Dustin Tran

@dustinvtran

7 years

Presenting "Why Aren't You Using Probabilistic Programming?" tomorrow. 8:05-8:30 at Hall C #NIPS2017

0

33

116

Dustin Tran

@dustinvtran

5 years

Highly recommend Emti's "Deep Learning with Bayesian Principles" tutorial. Emti has some unique perspectives on Bayesian analysis from optimization to structured inference.

Emtiyaz Khan

@EmtiyazKhan

5 years

Excited for the tutorial tomorrow (Dec 9) at 9am at #NeurIPS2019 If you are at the conference and would to chat, please send me an email (also, if you are interested in a post-doc position in our group at Tokyo).

2

14

115

Dustin Tran

@dustinvtran

7 years

An excellent intro to normalizing flows by @ericjang11—for density estimation, variational inference, and RL.

Eric Jang

@ericjang11

7 years

I finally learned what a determinant was and wrote a blog post on it. Check out this 2-part tutorial on Normalizing Flows! .

0

23

111

Dustin Tran

@dustinvtran

7 years

"Formulating [RL] as inference provides a number of other appealing tools: a natural exploration strategy based on entropy maximization, effective tools for inverse reinforcement learning, and the ability to deploy powerful approximate inference algorithms to solve RL problems."

Sergey Levine

@svlevine

7 years

If you want to know how probabilistic inference can be tied to optimal control, I just put up a new tutorial on control as inference: This expands on the control as inference lecture in my class:

0

17

110

Dustin Tran

@dustinvtran

7 years

See our #NIPS2017 poster tonight on implicit models + variational inference, with Rajesh Ranganath, Dave Blei. #179

1

21

108

Dustin Tran

@dustinvtran

8 years

#NIPS2016 tutorial for Variational Inference slides now online! by dave blei, @shakir_za, rajesh ranganath

1

54

111

Dustin Tran

@dustinvtran

10 months

Gemini is #1 overall on both text and vision arena, and Gemini is #1 on a staggering total of 20 out of 22 leaderboard categories. It's been a journey attaining such a powerful posttrained model. Proud to have co-lead the team!.

lmarena.ai

@lmarena_ai

10 months

Exciting News from Chatbot Arena!. @GoogleDeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive

10

8

112