I am really excited about our latest work!
A simple efficient framework to experiment with modern neural networks even on your laptop!
12 lines to write a transformer LM 🥳
Just in time for the holidays, we are releasing some new software today from Apple machine learning research.
MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!)
Code:
Docs:
We implemented quantization from scratch in a week. I think that is one of the biggest strengths of MLX. Easy to use but also easy to extend and customize.
We can’t wait to see what people will implement in a month!
What I find even cooler than training on an iPhone is that it is done with just 60 lines of code that are super readable and very familiar to anyone that writes training loops in python.
Let's go MLX Swift! 🚀🚀🚀
I have to say it because
@awnihannun
is quick to give credit to others but doesn’t take much for himself.
This performance improvement largely comes from his relentless hunting down of every kind of overhead in MLX the past weeks.
Kudos!!!
Looking back at all the amazing things people built with MLX in a couple of months I am incredibly excited to see the things that will be built now in a familiar dev environment in Swift!
Just 20 lines of code to write a general multi-head attention in MLX Swift 🚀🚀🚀
As part of our goal to make MLX a great research tool, we're expanding support to new languages like Swift and C, making experimentation on Apple silicon easier for ML researchers.
Video generating text with Mistral 7B and MLX Swift 👇
MLX is an array framework for machine
Code is also available! If you want to experiment with clustered attention all you need to do is
pip install pytorch-fast-transformers
and then use attention_type="improved-clustered".
Enjoy!
I assembled the
@NeurIPSConf
2020 accepted papers in a list that is easy to filter by author name, affiliation and paper title.
Which company do you think has the most first author papers?
@GoogleAI
@Pablogomez3
For the "few" of us that don't use JAX yet, you can now experiment with FAVOR+ (and other Fourier features) in
@PyTorch
using our fast-transformers library with just 2 lines of code.
Code:
Docs:
For the native Greek speakers, you can already interact with Meltemi on your laptop directly from HF using MLX.
I also uploaded a quantized 4-bit version on mlx-community for faster inference. Almost 20 tokens per second on a MacBook Air and 90 on an M2 Ultra!
I feel very lucky to have been at Idiap, it is a great place to pursue a PhD. I would also like to thank
@francoisfleuret
. I couldn't have asked for a better PhD advisor!
Idiaper wins
@EPFL
's EEDE Thesis Award ! 🏆
Former
#PhD
from our institute,
@angeloskath
has received EPFL's Electrical Engineering Doctoral program (
#EEDE
) Thesis Award for his outstanding research on the efficiency of
#DeepLearning
models.
▶️
Thank you Yannic for the amazing video.
The topic modeling intuition is a very interesting way to think about it and I hadn't thought of the kernels this way.
Anybody that doesn't follow Yannic is seriously missing out!!! Check out his channel
Apple MLX on Vision Pro? YES YOU CAN!
BOOM!!!
Here the raw video of MLX Swift LLMEval example running natively on the device!
Thanks
@awnihannun
🙏
🔥🔥🔥
#VisionPro
#LLM
#Apple
Because you haven't really released code until you release the documentation... I just finished the first version of docs for our ICML2019 paper!
You can find it at .
Oh, also you can just pip install attention-sampling .
And here it is on
@arxiv
TL;DR: A network computes an attention map on a downscaled image, and another processes locations sampled according to that map. The pair can be trained end-to-end.
I assembled the
@icmlconf
2019 accepted papers in a list that is easy to filter based, for instance, on affiliations or title.
First authors from:
Google 56
Microsoft 10
Facebook 7
Amazon 3
Apple 1
@EPFL_en
12
@ETH
16
#ICML2019
Did you know that clustered attention approximates a pretrained wav2vec on librispeech two times better than Performer's FAVOR?
Come talk to us at our
#NeurIPS2020
poster in 2 hours to find out more!
With
@angeloskath
and
@francoisfleuret
we will present our work on fast transformers with clustering at
#NeurIPS2020
on Thu @ 18:00 CET. Please visit our poster to know more. We will also answer questions on chat.
Poster:
Project:
@unixpickle
@awnihannun
Unified memory is the big one. The fast Metal kernels and linking to accelerate or Apple specific SIMD instructions would be another one.
We are very excited to explore what new architecture the above will enable or the impact to the existing ones!
ICCV reviewer invitation expires 2/1/2021 ... now does that mean I missed it or that when addressing an international crowd the US date notation is very confusing?
To reproduce the video above, first
pip install -U mlx_lm
and then
python -m mlx_lm.generate \
--model mlx-community/ilsp-Meltemi-7B-Instruct-v1-4bit \
--prompt "Πες μου την ιστορία της Ελλάδας σε μία παράγραφο." \
--temp 0.0 --max-tokens 2048
on any M-series Mac.
What started in May is finalized in Greece's national elections yesterday. The far-right, neo-fascist party did not make it in the greek parliament!
Hopefully, the rest of Europe will follow.
#ekloges19
#greekelections2019
#Europe
The definition of mixed feelings: When the far-right party of your country loses half their votes in 4 years and at the same time they will have 2 representatives in the european parliament because 4.9% is still too much.
#EuropeanElectionResults
#EUelections2019
Awesome work by a friend in
@Oxford_VGG
! Watch people fighting on TV (we all like that right?) without missing a single thing anybody says...
Related publications:
Can
#AI
modelling help people with hearing difficulties? Discover how
#OxfordAI
could assist those with hearing difficulties by isolating voices in noisy environments:
When we finished developing "Transformers are RNNs", we had planned to showcase it using music generation.
We ended up not investing the necessary time, but today I came across "Compound Word Transformer" and I love the generated music. Check it out!
Switzerland is not closing schools for
#COVID19
because it would endanger grandparents who would take care of the children.
Greece on the other hand pays for the vacation days of one of the two parents and closes all schools for 14 days.
Switzerland man-up!
Congrats to all the researchers from ILSP and Athena research center that worked on this, I couldn't find twitter handles to tag people so please let me know if I should be tagging someone.
@demirbasayyuce
@awnihannun
Well actually I don’t think you need any of that due to unified memory. Quantizing the Lora example in mlx should work out of the box. Haven’t tried it yet but I don’t see why not.
Usually I adore
@PyTorch
software engineering but going from v1.5.0 to v1.6.0 breaks at::detail::getDefaultCPUGenerator() which breaks some C++ extensions.
Shouldn't that be in the release notes?
@ykilcher
@_florianmai
@jiangelaa
@zacharylipton
@francoisfleuret
If you are looking for an intuitive explanation regarding why these methods don't help much on hard datasets (the question raised in the video), they rely on the existence of uninformative datapoints.
In Imagenet there are none for most of the training.
So...
@github
, you implement code search but decide to ignore . , : ; / \ ` ' " = * ! ? # $ & + ^ | ~ < > ( ) { } [ ] ?
I am having fun searching for function definitions/implementations without being able to use "func(" or "::func".
@WankyuChoi
I am super happy you picked it up 😁. I actually added it to the example after seeing your previous demo and comments. Great video as always!
@KassinosS
@awnihannun
Out of curiosity how would a simple relu MLP that passes the inputs through a simple sinusoidal positional encoding do in that problem?
In my experience they are a pretty good baseline for any such function approximation.
See for examples of what I mean.
I have two open phd positions in my group at
@Idiap_ch
/
@EPFL_en
Both in deep learning, one in computer vision to combine multi-sensors for scene reconstruction, and the other for weather forecast and air traffic control.
@unixpickle
@gazorp5
@awnihannun
Moreover, designing a backend would mean we inherit all the negative aspects of these frameworks, whether they are shape based compilation or eager computation or something else.
@dimadamen
@ducha_aiki
Oops, sorry if it was perceived as whining, mostly meant as a joke 😁.
Thanks a lot for the reply and taking it into account for the future!
@ivanfioravanti
@emrekoctw
@awnihannun
The UNet and text encoders should be fine as they only need about 4GB when quantized.
The decoder otoh needs more. The trick there is to apply the decoder in a tiling fashion but I am not 100% sure it will be straightforward.
@andriy_mulyar
@_joaogui1
@pragmaticml
Besides the custom kernels, I think the jax implementation of linear attention is a bit off. In theory, it should be identical to performers without the feature map so *at least* as fast... In our implementation it is 2-3 times faster than FAVOR with 256 dims.
@unixpickle
@gazorp5
@awnihannun
It would be quite an architectural change I believe to have unified memory in either of the two.
It is not as simple as making a backend since the operations need to synchronize but not copy even though they may run on GPU or CPU.
@chriswolfvision
@francoisfleuret
Well, I think broadcasting is great! The problem is with implicit expand_dims.
Who thought that it was a good idea to implicitly resize tensors so that the dims work? Under that reasoning all element wise operations are possible by expanding enough times both tensors...
Removing a public member from a python module is a backwards incompatible change and should incur a major version change.
Looking at you keras.backend ... that you no longer provide tf moving from v2.2.4 to v2.2.5 .
@fchollet
@walkfourmore
You can fine tune it using LoRA on your laptop (see the MLX examples).
An 8GB MacBook Air won’t break any speed records but you can easily fine tune it on your data over night if they are about a book long.
The definition of mixed feelings: When the far-right party of your country loses half their votes in 4 years and at the same time they will have 2 representatives in the european parliament because 4.9% is still too much.
#EuropeanElectionResults
#EUelections2019