Angelos Katharopoulos @angeloskath profile

Angelos Katharopoulos

@angeloskath

Followers

2,045

Following

238

Media

12

Statuses

163

Machine Learning Research @Apple . Previously PhD student at @idiap_ch and @EPFL . Interested in all things machine learnable

https://t.co/WxIewrYHDs

Mountain View, CA

Joined June 2017

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Bronx • 938958 Tweets

Rafah • 423632 Tweets

Xavi • 310790 Tweets

Flick • 161766 Tweets

LEAVE SEVENTEEN ALONE • 105188 Tweets

سلمان • 71851 Tweets

Memorial Day • 68168 Tweets

#SRHvsRR • 65516 Tweets

#تتويج_الهلال • 57110 Tweets

العدل الدوليه • 50672 Tweets

QSMP • 41459 Tweets

Coutinho • 37662 Tweets

Haiti • 35681 Tweets

Sokak Köpekleri Toplatılsın • 29854 Tweets

Super Size Me • 29393 Tweets

Morgan Spurlock • 29172 Tweets

Ergin Ataman • 24856 Tweets

Neto • 24646 Tweets

INEOS • 24237 Tweets

Gove • 19942 Tweets

#التتويج_حديث_العالم • 19163 Tweets

Yunan • 18814 Tweets

Mourão • 18022 Tweets

Tyga • 16914 Tweets

القادسية • 15769 Tweets

الدوري الاقوي • 13825 Tweets

فهد بن • 12935 Tweets

Vegetta • 12740 Tweets

Kylie • 12671 Tweets

Karoline • 12319 Tweets

نيمار • 11386 Tweets

Militão • 11267 Tweets

Kelce • 11213 Tweets

Komar

Rodrigo Mora

Spence

Sharks

علي المسحل

سيد المحافل

Leadsom

البقرات الثلاث

نيفيز

نادي الكويت

Ρεαλ

الارقام القياسيه

Ten Hag

Portillo

$BPRIVA

#عمرو_الليثي_والعوضي

#مشبعناش_ال١٢_متسيبوهاش

Last Seen Profiles

@JinnJ4563

@ZZ775800

@comfymoonlight

@SeptianaNanang

@duoduoqueen

@gyunes

@bolnica_genesis

@markfinnis1

@TheresaStager

@theverylastime

@team_zev

@ShitSecure

@GetFrenchBox

@ZZ775800

@TurquoiseMovie

@jane_ryantatts

@soxo1987

@TradingDave1

@kenDDDCb

@argenis1408

Pinned Tweet

Angelos Katharopoulos

@angeloskath

6 months

I am really excited about our latest work! A simple efficient framework to experiment with modern neural networks even on your laptop! 12 lines to write a transformer LM 🥳

Awni Hannun

@awnihannun

6 months

Just in time for the holidays, we are releasing some new software today from Apple machine learning research. MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!) Code: Docs:

100

710

4K

2

5

105

Angelos Katharopoulos

@angeloskath

5 months

We implemented quantization from scratch in a week. I think that is one of the biggest strengths of MLX. Easy to use but also easy to extend and customize. We can’t wait to see what people will implement in a month!

Awni Hannun

@awnihannun

5 months

Big update to MLX but especially 🥁 N-bit quantization and quantized matmul kernels! Thanks to the wizardry of @angeloskath pip install -U mlx

3

26

249

4

16

156

Angelos Katharopoulos

@angeloskath

3 months

What I find even cooler than training on an iPhone is that it is done with just 60 lines of code that are super readable and very familiar to anyone that writes training loops in python. Let's go MLX Swift! 🚀🚀🚀

Awni Hannun

@awnihannun

3 months

Using MLX Swift to train LeNet on MNIST. Takes less than a minute on my iPhone 14. Example here: @ylecun long-live MNIST!

11

25

295

2

23

151

Angelos Katharopoulos

@angeloskath

1 month

I have to say it because @awnihannun is quick to give credit to others but doesn’t take much for himself. This performance improvement largely comes from his relentless hunting down of every kind of overhead in MLX the past weeks. Kudos!!!

Awni Hannun

@awnihannun

1 month

MLX 0.10 → 0.11, faster generation across model sizes and machines. tokens-per-second for 4-bit models:

7

18

225

6

11

112

Angelos Katharopoulos

@angeloskath

3 months

Looking back at all the amazing things people built with MLX in a couple of months I am incredibly excited to see the things that will be built now in a familiar dev environment in Swift! Just 20 lines of code to write a general multi-head attention in MLX Swift 🚀🚀🚀

Awni Hannun

@awnihannun

3 months

As part of our goal to make MLX a great research tool, we're expanding support to new languages like Swift and C, making experimentation on Apple silicon easier for ML researchers. Video generating text with Mistral 7B and MLX Swift 👇 MLX is an array framework for machine

21

64

441

2

6

62

Angelos Katharopoulos

@angeloskath

4 years

Code is also available! If you want to experiment with clustered attention all you need to do is pip install pytorch-fast-transformers and then use attention_type="improved-clustered". Enjoy!

François Fleuret

@francoisfleuret

4 years

One paper accepted at @NeurIPSConf with @apoorv2904 and @angeloskath on speeding up attention by clustering the queries. The nice thing is that this can be used for inference with standard pre-trained models. @Idiap_ch @unige_en @EPFL_en @snsf_ch

2

18

110

1

18

57

Angelos Katharopoulos

@angeloskath

4 years

I assembled the @NeurIPSConf 2020 accepted papers in a list that is easy to filter by author name, affiliation and paper title. Which company do you think has the most first author papers?

1

6

47

Angelos Katharopoulos

@angeloskath

5 months

How about your personal chat GPT on your M2 Ultra? Amazing model by Mistral AI and ~1 day to implement it in MLX.

Awni Hannun

@awnihannun

5 months

Mixtral 8x7B in MLX Runs on an M2 Ultra 🚢🚢

26

60

605

0

3

46

Angelos Katharopoulos

@angeloskath

4 years

@GoogleAI @Pablogomez3 For the "few" of us that don't use JAX yet, you can now experiment with FAVOR+ (and other Fourier features) in @PyTorch using our fast-transformers library with just 2 lines of code. Code: Docs:

1

5

45

Angelos Katharopoulos

@angeloskath

2 months

For the native Greek speakers, you can already interact with Meltemi on your laptop directly from HF using MLX. I also uploaded a quantized 4-bit version on mlx-community for faster inference. Almost 20 tokens per second on a MacBook Air and 90 on an M2 Ultra!

Ion Androutsopoulos

@ionandrou

2 months

5

18

65

2

9

41

Angelos Katharopoulos

@angeloskath

1 year

I feel very lucky to have been at Idiap, it is a great place to pursue a PhD. I would also like to thank @francoisfleuret . I couldn't have asked for a better PhD advisor!

Idiap Research Institute

@Idiap_ch

1 year

Idiaper wins @EPFL 's EEDE Thesis Award ! 🏆 Former #PhD from our institute, @angeloskath has received EPFL's Electrical Engineering Doctoral program ( #EEDE ) Thesis Award for his outstanding research on the efficiency of #DeepLearning models. ▶️

5

6

28

1

0

27

Angelos Katharopoulos

@angeloskath

4 years

Thank you Yannic for the amazing video. The topic modeling intuition is a very interesting way to think about it and I hadn't thought of the kernels this way. Anybody that doesn't follow Yannic is seriously missing out!!! Check out his channel

Yannic Kilcher

I make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society. Twitter: https://twitter.com/ykilcher Discord: https://yk...

www.youtube.com

Yannic Kilcher 🇸🇨

@ykilcher

4 years

New Video 🔥 No more O(N^2) complexity in Transformers: Kernels to the rescue! 🥳 This paper makes Attention linear AND shows an intriguing connection between Transformers and RNNs 💪 @angeloskath @apoorv2904 @nik0spapp @francoisfleuret @EPFL_en @Idiap_ch

4

74

354

1

3

22

Angelos Katharopoulos

@angeloskath

3 years

What a game! #PameStefane #RolandGarros #Tsitsipas

0

20

Angelos Katharopoulos

@angeloskath

2 months

This is too cool. Now let's combine it with a TTS model and have it tell us nice stories while looking at the beautiful lake...

ifioravanti

@ivanfioravanti

2 months

Apple MLX on Vision Pro? YES YOU CAN! BOOM!!! Here the raw video of MLX Swift LLMEval example running natively on the device! Thanks @awnihannun 🙏 🔥🔥🔥 #VisionPro #LLM #Apple

13

24

191

3

1

21

Angelos Katharopoulos

@angeloskath

5 years

Because you haven't really released code until you release the documentation... I just finished the first version of docs for our ICML2019 paper! You can find it at . Oh, also you can just pip install attention-sampling .

François Fleuret

@francoisfleuret

5 years

And here it is on @arxiv TL;DR: A network computes an attention map on a downscaled image, and another processes locations sampled according to that map. The pair can be trained end-to-end.

0

20

81

0

5

17

Angelos Katharopoulos

@angeloskath

5 years

I assembled the @icmlconf 2019 accepted papers in a list that is easy to filter based, for instance, on affiliations or title. First authors from: Google 56 Microsoft 10 Facebook 7 Amazon 3 Apple 1 @EPFL_en 12 @ETH 16 #ICML2019

0

3

15

Angelos Katharopoulos

@angeloskath

3 years

Did you know that clustered attention approximates a pretrained wav2vec on librispeech two times better than Performer's FAVOR? Come talk to us at our #NeurIPS2020 poster in 2 hours to find out more!

Apoorv Vyas

@apoorv2904

3 years

With @angeloskath and @francoisfleuret we will present our work on fast transformers with clustering at #NeurIPS2020 on Thu @ 18:00 CET. Please visit our poster to know more. We will also answer questions on chat. Poster: Project:

0

6

13

0

1

14

Angelos Katharopoulos

@angeloskath

6 months

@unixpickle @awnihannun Unified memory is the big one. The fast Metal kernels and linking to accelerate or Apple specific SIMD instructions would be another one. We are very excited to explore what new architecture the above will enable or the impact to the existing ones!

1

0

14

Angelos Katharopoulos

@angeloskath

2 months

I know which model I am uploading to MLX community today 🚀

Ion Androutsopoulos

@ionandrou

2 months

5

18

65

1

14

Angelos Katharopoulos

@angeloskath

3 years

ICCV reviewer invitation expires 2/1/2021 ... now does that mean I missed it or that when addressing an international crowd the US date notation is very confusing?

1

0

14

Angelos Katharopoulos

@angeloskath

4 years

I wish CMT had a negative tweet limit. Basically, if your review fits in a tweet, you shouldn't be able to submit it. #cvpr2020 #cvpr

0

10

Angelos Katharopoulos

@angeloskath

2 months

To reproduce the video above, first pip install -U mlx_lm and then python -m mlx_lm.generate \ --model mlx-community/ilsp-Meltemi-7B-Instruct-v1-4bit \ --prompt "Πες μου την ιστορία της Ελλάδας σε μία παράγραφο." \ --temp 0.0 --max-tokens 2048 on any M-series Mac.

1

10

Angelos Katharopoulos

@angeloskath

5 years

What started in May is finalized in Greece's national elections yesterday. The far-right, neo-fascist party did not make it in the greek parliament! Hopefully, the rest of Europe will follow. #ekloges19 #greekelections2019 #Europe

Angelos Katharopoulos

@angeloskath

5 years

The definition of mixed feelings: When the far-right party of your country loses half their votes in 4 years and at the same time they will have 2 representatives in the european parliament because 4.9% is still too much. #EuropeanElectionResults #EUelections2019

0

1

0

9

Angelos Katharopoulos

@angeloskath

4 years

@francoisfleuret But we established that it cannot do this though...

1

0

9

Angelos Katharopoulos

@angeloskath

6 years

Awesome work by a friend in @Oxford_VGG ! Watch people fighting on TV (we all like that right?) without missing a single thing anybody says... Related publications:

University of Oxford

@UniofOxford

6 years

Can #AI modelling help people with hearing difficulties? Discover how #OxfordAI could assist those with hearing difficulties by isolating voices in noisy environments:

4

47

85

0

1

9

Angelos Katharopoulos

@angeloskath

3 years

When we finished developing "Transformers are RNNs", we had planned to showcase it using music generation. We ended up not investing the necessary time, but today I came across "Compound Word Transformer" and I love the generated music. Check it out!

Compound Word Transformer: Generate Pop Piano Music of Full-Song Length Taiwan AILabs

ailabs.tw

0

2

8

Angelos Katharopoulos

@angeloskath

4 years

Switzerland is not closing schools for #COVID19 because it would endanger grandparents who would take care of the children. Greece on the other hand pays for the vacation days of one of the two parents and closes all schools for 14 days. Switzerland man-up!

0

7

Angelos Katharopoulos

@angeloskath

5 years

Arxiv and code coming soon...

François Fleuret

@francoisfleuret

5 years

One paper accepted at #ICML2019 with @angeloskath on attention-sampling with deep architectures to process megapixel images.

1

0

27

0

1

7

Angelos Katharopoulos

@angeloskath

4 years

@CVPRConf website is down but @paschalidoud_1 is also analog! #CVPR2020

0

7

Angelos Katharopoulos

@angeloskath

4 years

@lucidrains @apoorv2904 @SmallerNNsPls @francoisfleuret @trees_random @icmlconf @nik0spapp @Idiap_ch @EPFL Thanks for the interest! Indeed. However, the main benefit of our work is the derivation of a formulation that allows to write an autoregressive transformer as an RNN; thus resulting in orders of magnitude speed up during inference. (we really need to speed up the preprint :-))

1

0

5

Angelos Katharopoulos

@angeloskath

4 years

@SmallerNNsPls @francoisfleuret @trees_random @icmlconf @apoorv2904 @nik0spapp @Idiap_ch @EPFL Yes, they are normalized as follows Ψ(Q) Ψ(K)' V / (sum_i Ψ(Q) Ψ(K)_i). You have to assume some broadcasting semantics in the above equation due to twitter.

3

0

5

Angelos Katharopoulos

@angeloskath

2 months

Congrats to all the researchers from ILSP and Athena research center that worked on this, I couldn't find twitter handles to tag people so please let me know if I should be tagging someone.

2

5

Angelos Katharopoulos

@angeloskath

5 months

@demirbasayyuce @awnihannun Well actually I don’t think you need any of that due to unified memory. Quantizing the Lora example in mlx should work out of the box. Haven’t tried it yet but I don’t see why not.

0

4

Angelos Katharopoulos

@angeloskath

4 years

Usually I adore @PyTorch software engineering but going from v1.5.0 to v1.6.0 breaks at::detail::getDefaultCPUGenerator() which breaks some C++ extensions. Shouldn't that be in the release notes?

0

4

Angelos Katharopoulos

@angeloskath

5 years

@ykilcher @_florianmai @jiangelaa @zacharylipton @francoisfleuret If you are looking for an intuitive explanation regarding why these methods don't help much on hard datasets (the question raised in the video), they rely on the existence of uninformative datapoints. In Imagenet there are none for most of the training.

1

0

4

Angelos Katharopoulos

@angeloskath

3 months

Oh and the model definition looks even more familiar.

0

4

Angelos Katharopoulos

@angeloskath

4 years

@francoisfleuret Easy. Woodworker or blacksmith or both. Making tools to make tools to make tools is still one of the big joys of life.

0

4

Angelos Katharopoulos

@angeloskath

5 years

So... @github , you implement code search but decide to ignore . , : ; / \ ` ' " = * ! ? # $ & + ^ | ~ < > ( ) { } [ ] ? I am having fun searching for function definitions/implementations without being able to use "func(" or "::func".

1

0

4

Angelos Katharopoulos

@angeloskath

5 months

@WankyuChoi I am super happy you picked it up 😁. I actually added it to the example after seeing your previous demo and comments. Great video as always!

1

0

2

Angelos Katharopoulos

@angeloskath

3 years

@CSProfKGD @CVPR This is great! Could not think of anybody better for this :-)

0

3

Angelos Katharopoulos

@angeloskath

4 months

@KassinosS @awnihannun Out of curiosity how would a simple relu MLP that passes the inputs through a simple sinusoidal positional encoding do in that problem? In my experience they are a pretty good baseline for any such function approximation. See for examples of what I mean.

1

0

1

Angelos Katharopoulos

@angeloskath

5 years

Yay! Our awesome group is growing!

François Fleuret

@francoisfleuret

5 years

I have two open phd positions in my group at @Idiap_ch / @EPFL_en Both in deep learning, one in computer vision to combine multi-sensors for scene reconstruction, and the other for weather forecast and air traffic control.

0

31

75

0

3

Angelos Katharopoulos

@angeloskath

4 years

@francoisfleuret Advice for people looking for a career: learn software engineering

0

2

Angelos Katharopoulos

@angeloskath

6 months

@unixpickle @gazorp5 @awnihannun Moreover, designing a backend would mean we inherit all the negative aspects of these frameworks, whether they are shape based compilation or eager computation or something else.

1

0

2

Angelos Katharopoulos

@angeloskath

3 years

@dimadamen @ducha_aiki Oops, sorry if it was perceived as whining, mostly meant as a joke 😁. Thanks a lot for the reply and taking it into account for the future!

0

2

Angelos Katharopoulos

@angeloskath

4 years

@ducha_aiki @francoisfleuret @apoorv2904 @nik0spapp @jb_cordonnier Well, not instead of self-attention but you could look at that uses a similar mechanism with completely data independent values to replace fully connected layers.

0

2

Angelos Katharopoulos

@angeloskath

4 years

@lucidrains @apoorv2904 @SmallerNNsPls @francoisfleuret @trees_random @icmlconf @nik0spapp @Idiap_ch @EPFL In pseudocode yes. In practice this requires N times more memory than necessary so we opt for a custom CUDA kernel. During inference this is kept as the state so one only needs the last value of the cumsum anyway (so no custom kernels necessary).

0

2

Angelos Katharopoulos

@angeloskath

3 months

@ivanfioravanti @emrekoctw @awnihannun The UNet and text encoders should be fine as they only need about 4GB when quantized. The decoder otoh needs more. The trick there is to apply the decoder in a tiling fashion but I am not 100% sure it will be straightforward.

0

2

Angelos Katharopoulos

@angeloskath

4 years

@andriy_mulyar @_joaogui1 @pragmaticml Besides the custom kernels, I think the jax implementation of linear attention is a bit off. In theory, it should be identical to performers without the feature map so *at least* as fast... In our implementation it is 2-3 times faster than FAVOR with 256 dims.

0

2

Angelos Katharopoulos

@angeloskath

6 months

@unixpickle @gazorp5 @awnihannun It would be quite an architectural change I believe to have unified memory in either of the two. It is not as simple as making a backend since the operations need to synchronize but not copy even though they may run on GPU or CPU.

1

0

2

Angelos Katharopoulos

@angeloskath

5 years

I know I probably shouldn't be using in my code but keras definitely shouldn't be using 'from tensorflow_backend import *' ...

0

1

Angelos Katharopoulos

@angeloskath

3 years

@chriswolfvision @francoisfleuret Well, I think broadcasting is great! The problem is with implicit expand_dims. Who thought that it was a good idea to implicitly resize tensors so that the dims work? Under that reasoning all element wise operations are possible by expanding enough times both tensors...

1

0

1

Angelos Katharopoulos

@angeloskath

2 years

@Suuraj @francoisfleuret @pafrossard @AlexAlahi @_beenkim @LudovicDenoyer Congratulations to both you guys!!! Well deserved! 🥳🥳🥳

1

0

1

Angelos Katharopoulos

@angeloskath

5 years

Removing a public member from a python module is a backwards incompatible change and should incur a major version change. Looking at you keras.backend ... that you no longer provide tf moving from v2.2.4 to v2.2.5 . @fchollet

1

0

1

Angelos Katharopoulos

@angeloskath

2 months

@walkfourmore You can fine tune it using LoRA on your laptop (see the MLX examples). An 8GB MacBook Air won’t break any speed records but you can easily fine tune it on your data over night if they are about a book long.

2

0

1

Angelos Katharopoulos

@angeloskath

5 years

@dave_andersen @ykilcher @_florianmai @jiangelaa @zacharylipton @francoisfleuret Specifically, if we consider the gradient norm as an indicator on whether a sample is informative we see that for Imagenet the distribution of the norms is much closer to uniform (hence we cannot reduce the variance as depicted).

1

0

1

Angelos Katharopoulos

@angeloskath

5 years

The definition of mixed feelings: When the far-right party of your country loses half their votes in 4 years and at the same time they will have 2 representatives in the european parliament because 4.9% is still too much. #EuropeanElectionResults #EUelections2019

0

1

Angelos Katharopoulos

@angeloskath

5 years

@dave_andersen @ykilcher @_florianmai @jiangelaa @zacharylipton @francoisfleuret It's my bad for posting it without more context. It is the empirical variance of the mini-batch gradient under different sampling distributions. Namely we sample mini-batches compute the grad and compare the norm of the diff with the average gradient.

2

0

1