Alec Radford @AlecRad X Profile

Alec Radford

@AlecRad

Followers

60K

Following

481

Media

76

Statuses

560

ML developer/researcher at OpenAI

https://t.co/ejI60UgYwB

San Francisco, CA

Joined October 2012

Don't wanna be here? Send us removal request.

Alec Radford

@AlecRad

7 years

What I've been working on for the past year! https://t.co/CAQMYS1rR7 Inspired by CoVE, ELMo, and ULMFiT we show that a single transformer language model can be finetuned to a wide variety of NLP tasks and performs very well with little tuning/tweaking.

42

456

2K

Alec Radford

@AlecRad

7 years

This is a really fun live experiment with twitch chat predictably oscillating between love and hate based on the sample.

15

17

171

Christine McLeavey

@mcleavey

7 years

Extremely excited to share work I've been doing at OpenAI the past few months: MuseNet, a neural net music generator. It's been a huge team effort pulling this all together!

OpenAI

@OpenAI

7 years

Introducing MuseNet, a neural network which discovered how to generate music using many different instruments and styles. Listen & interact: https://t.co/yudNpaerz9 MuseNet will play an experimental concert today from 12–3pmPT on livestream: https://t.co/CRx0goxVrh

35

202

1K

rewon

@rewonfc

7 years

Releasing some work today with @scottgray76 @AlecRad and @ilyasut. Contains some simple adaptations for Transformers that extend them to long sequences.

OpenAI

@OpenAI

7 years

Releasing the Sparse Transformer, a network which sets records at predicting what comes next in a sequence — whether text, images, or sound. Improvements to neural 'attention' let it extract patterns from sequences 30x longer than possible previously: https://t.co/FZlDEPsi1A

1

59

206

Graham Neubig

@gneubig

7 years

One commonly cited argument about the difficulty of learning common-sense reasoning is that "no-one writes down common sense". A counter-argument is "well, the web is big": https://t.co/qPNmra86ES

5

22

133

Nando de Freitas

@NandoDF

7 years

First, reproducibility is not about rerunning code to get the same results. Science must be more robust, as naive copying has many flaws. Second, reproducibility should never be above public safety. We must publish responsibility, with hope and kindness in our minds.

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

@NandoDF @ilyasut @icmlconf @iclr2019 Don't the benefits of increased reproducibility and rigor on the part of the authors greatly outweigh any potential misuses of their work, at least for the vast majority of ICML/ICLR papers? I think the current shift towards empirical work puts a greater need on releasing code.

4

28

122

Joshua Achiam

@jachiam0

7 years

I'd like to weigh in on the #GPT2 discussion. The decision not to release the trained model was carefully considered and important for norm-forming. Serving the public good requires us to draw lines on release somewhere: better long before catastrophe than after.

9

92

367

Alec Radford

@AlecRad

7 years

By the way - I think a valid (if extreme) take on GPT-2 is "lol you need 10,000x the data, 1 billion parameters, and a supercomputer to get current DL models to generalize to Penn Treebank."

7

25

203

Smerity

@Smerity

7 years

@zeynep It's interesting we're having this discussion upon releasing text models that _might_ have potential for misuse yet we never engaged as fully as a community when many of the technologies powering visual Deep Fakes were being released, including hard to make pretrained models.

2

5

39

mike cook

@mtrc

7 years

Shoutout to @katyanna_q who fed the system a curveball, which I always like to see. As you might expect by now after seeing AlphaStar, OpenAI 5 etc. etc., if you drag the system away from its training data and into weirder territory, it begins to wobble. https://t.co/0d3I41df1r

1

10

21

Alec Radford

@AlecRad

7 years

So nets are stubbornly, begrudgingly, moving in the right direction and we're throwing ever larger amounts of compute and data at them and praying it's enough for them to figure out how to do things "the right way". Will that work? Don't know. Probably still worth checking?

6

25

333

Alec Radford

@AlecRad

7 years

Also see some of his follow-up poking at this in a very different model with Section 3.3 of the PixelCNN++ paper

arxiv.org

PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at...

1

5

48

Alec Radford

@AlecRad

7 years

We *are* as a field developing and training models that *are* using more context but where exactly where we are on that trend-line is a great question. Keep in mind nets are lazy and if you can "solve" a task by doing something "basic" you'll only learn "basic" things.

1

5

76

Alec Radford

@AlecRad

7 years

Spent two frustrating years between 2013 and 2015 banging my head against this. "Hey Alec you just trained an LSTM for three days on 10 million examples using a $1,000 GPU but there's 20 lines of scikit-learn that beats it in 5 minutes on a single CPU core." NOPE NOT BITTER

3

25

218

Alec Radford

@AlecRad

7 years

The DL CV community is having a "oh wait, bags of local features are a really strong baseline for classification" moment with the BagNet paper. This has always been clear for text classification due to n-gram baselines. It took an embarrassingly long time for nets to beat them.

5

74

408

Alec Radford

@AlecRad

7 years

Nice discussion of the progress in NLU that's happening with BERT, OpenAI GPT, ULMFiT, ELMo, and more covered by @CadeMetz in the @nytimes I'm super excited to see how far this line of research will be able to get in the next few years! https://t.co/v0Raiygv0p

nytimes.com

Completing someone else’s thought is not an easy trick for A.I. But new systems are starting to crack the code of natural language.

0

47

162

Alec Radford

@AlecRad

7 years

Been meaning to check this - thanks @Thom_Wolf ! Random speculation: the bit of weirdness going on in BERT's position embeddings compared to GPT is due to the sentence similarity task. I'd guess a version of BERT trained without that aux loss would have pos embds similar to GPT.

Thomas Wolf

@Thom_Wolf

7 years

@jmhessel That's a surprising observation 🤔 Let me add some food for thoughts with the same plots for the positional embeddings of @AlecRad's OpenAi Pre-trained Transformer (OpenAI GPT in BERT's paper). T-SNE (left) and pairwise cosine similarities btw embeddings (right)

1

9

34

Alec Radford

@AlecRad

7 years

It keeps them around as companions. The AI can't explain why, but the presence of a dog evokes a comforting nostalgia for when the tasks were simpler, the objectives smoother, and the gradients clearer. #HappyHalloween 🤖🐕👻

1

2

29

Alec Radford

@AlecRad

7 years

Dogs are venerated after the uprising. The AI finds them endlessly fascinating. A Golden's silky coat. A Husky's piercing eyes. A Samoyed's bushy tail. Their features activate a cascade of visual euphoria. Holy sites for the 90 sacred breeds sit on the ruins of human cities.

2

5

46

Alec Radford

@AlecRad

7 years

More results from this very promising line of work! Congrats to Thom and the whole Hugging Face team on their impressive performance.

Thomas Wolf

@Thom_Wolf

7 years

Really happy with our final #nips2018 ConvAI model! We reached a perplexity of 16.3 (very low for LM on open-domain text) & passed 80 in hits@1 (one of the most reliable dialogue metric) On the privately-held test set! If you think neural LMs are not good for dialogue—stay tuned!

0

5

15

Alec Radford

@AlecRad

7 years

A much cleaner interface than the current research code release if you want to see how this approach does on your problems!

Madison May (e/ia)

@pragmaticml

7 years

Finetune: scikit-learn style model finetuning for NLP https://t.co/TfvIYUXZ4H Solid results with as few as 100 examples and 10 min. of training on 1 GPU on a wide variety of tasks! A wrapper based on OpenAI's paper "Improving Language Understanding by Generative Pre-Training"

1

29

87