Alex Alemi @alemi X Profile

Alex Alemi

@alemi

Followers

1K

Following

90

Media

4

Statuses

65

Machine Learning Researcher

https://t.co/mDqWRjceKQ

Kissimmee, FL

Joined January 2008

Don't wanna be here? Send us removal request.

Pavel Izmailov

@Pavel_Izmailov

1 year

I am recruiting Ph.D. students for my new lab at @nyuniversity! Please apply, if you want to work with me on reasoning, reinforcement learning, understanding generalization and AI for science. Details on my website: https://t.co/d8uId2LC47. Please spread the word!

17

104

748

Alex Alemi

@alemi

8 months

Recently I've been playing around with a quarter-order-of-magnitude system for simple calculations. It gives better precision than single sig-fig calculations using only four, very intuitive, symbols. https://t.co/BO9mLi8pLF

0

8

Alex Alemi

@alemi

1 year

If you miss the NYTimes needle, especially one that is statistically uniform ( https://t.co/uqLw9f69Sw), you can use this page: https://t.co/xQ5cFrtRSD I whipped together to reason about the correlations between the swing states tonight as results come in.

0

1

18

Alex Alemi

@alemi

1 year

Why don't we measure probabilities in degrees? https://t.co/uqLw9f5C2Y

4

11

57

Alex Alemi

@alemi

1 year

In which I try to make sense of most of machine learning:

5

41

294

Brian Lester

@blester125

2 years

Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out https://t.co/DRO2IbTFCg and help @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd, @noahconst and I make Kevin’s dream a reality.

0

6

15

Noah Constant

@noahconst

2 years

Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. https://t.co/mceqpUfZQo With @blester125, @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd

arxiv.org

In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors...

2

10

76

Alex Alemi

@alemi

2 years

Each delivery service should use its own distinctive knock.

1

0

2

Alex Alemi

@alemi

3 years

PaLM 540 Billion, Google's large language model used 4.2 moles of flops to train. 4.2 Moles!

0

9

Ben Poole

@poolio

3 years

Happy to announce DreamFusion, our new method for Text-to-3D! https://t.co/4xI2VHcoQW We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of @BenMildenhall @ajayj_ @jon_barron #dreamfusion

128

1K

6K

Alex Alemi

@alemi

3 years

@dpkingma @poolio To accompany the colab, I've also written a blog post https://t.co/qvmp8pg1g6 attempting to make sense of the VDM Diffusion loss. In it, I try to motivate how the VDM diffusion loss is simply the joint KL between the forward and reverse process.

2

11

51

Durk Kingma

@dpkingma

3 years

Want to understand and/or play with variational diffusion models? - See https://t.co/V1jP11fMmI for a simple stand-alone implementation and explanation. (Thanks @alemi and @poolio for making this)! - See https://t.co/kwlCncttBk for an even more basic implementation on 2D data.

colab.research.google.com

Run, share, and edit Python notebooks

1

63

327

Ravid Shwartz Ziv

@ziv_ravid

3 years

A pretty cool paper (and I also hope useful) on using pre-training models to create highly informative priors for downstream tasks. Thanks to all the collaborators, it was a lot of fun!

Andrew Gordon Wilson

@andrewgwils

3 years

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors. https://t.co/cglYGiLNeM w/@ziv_ravid, @micahgoldblum, @HosseinSouri8, @snymkpr, @Eiri1114, @ylecun 1/6

2

12

79

Ethan Dyer

@ethansdyer

3 years

1/ Super excited to introduce #Minerva 🦉( https://t.co/UI7zV0IXlS). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

alewkowycz

@alewkowycz

3 years

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. https://t.co/bQJOyMSCD4

29

519

3K

Chitwan Saharia

@Chitwan_Saharia

3 years

We are thrilled to announce Imagen, a text-to-image model with unprecedented photorealism and deep language understanding. Explore https://t.co/mSplg4FlsM and Imagen! A large rusted ship stuck in a frozen lake. Snowy mountains and beautiful sunset in the background. #imagen

57

297

2K

Alex Alemi

@alemi

4 years

you can verify with `echo -n "answer" | md5sum`

0

Alex Alemi

@alemi

4 years

here are the next few days wordle answers as md5 hashes 2022-01-11 = 0b18a3d7b9c43ff1750d2baa4606b8d0 2022-01-12 = 047fb90408a79f189d51cbcea168b1a5 2022-01-13 = ab3358313efb03210a1babfb372246f1 2022-01-14 = d821e448212defd91ac1e67f9653a34d

3

0

2

Samuel Stanton

@samuel_stanton_

4 years

We are presenting our paper "Does Knowledge Distillation Really Work?" at #NeurIPS2021 poster session 2 today - come check it out! Joint work with @Pavel_Izmailov, @polkirichenko, @alemi, and @andrewgwils. Poster: https://t.co/N4PlsxnpZE Paper: https://t.co/UNSIizi2GG

2

13

78

Venkat Viswanathan

@venkvis

4 years

Excited to kick-start focus #SciML series on #ML meets Info theory and statistical mechanics! Amazing speaker/session chair line-up: @alemi (@wellingmax), @pratikac (Karthik), @ShoYaida (@jaschasd), @yasamanbb (@SuryaGanguli) and Elena Agliari. Details at:

4

33

196

Polina Kirichenko

@polkirichenko

4 years

While most papers on knowledge distillation focus on student accuracy, we investigate the agreement between teacher and student networks. Turns out, it is very challenging to match the teacher (even on train data!), despite the student having enough capacity and lots of data.

Andrew Gordon Wilson

@andrewgwils

4 years

Does knowledge distillation really work? While distillation can improve student generalization, we show it is extremely difficult to achieve good agreement between student and teacher. https://t.co/VpK6Xy2q3S With @samscub, @Pavel_Izmailov, @polkirichenko, Alex Alemi. 1/10

3

15

114