@angeloskath
Angelos Katharopoulos
3 months
For the native Greek speakers, you can already interact with Meltemi on your laptop directly from HF using MLX. I also uploaded a quantized 4-bit version on mlx-community for faster inference. Almost 20 tokens per second on a MacBook Air and 90 on an M2 Ultra!
@ionandrou
Ion Androutsopoulos
3 months
5
18
65
2
9
41

Replies

@angeloskath
Angelos Katharopoulos
3 months
To reproduce the video above, first pip install -U mlx_lm and then python -m mlx_lm.generate \ --model mlx-community/ilsp-Meltemi-7B-Instruct-v1-4bit \ --prompt "Πες μου την ιστορία της Ελλάδας σε μία παράγραφο." \ --temp 0.0 --max-tokens 2048 on any M-series Mac.
1
1
10
@angeloskath
Angelos Katharopoulos
3 months
Congrats to all the researchers from ILSP and Athena research center that worked on this, I couldn't find twitter handles to tag people so please let me know if I should be tagging someone.
2
2
5
@angeloskath What am I doing wrong? I run the q4 version on a base MacBook Air M1 and inference runs at 2 token / min or smth like that.
1
0
0
@angeloskath
Angelos Katharopoulos
3 months
@walkfourmore Hmm just ran it on an 8GB M1 Mac mini (same chip) and it gets a very respectable 12 tps. Feel free to file an issue on GitHub with details to help you debug your setup. Otherwise doing a fresh install on a new python environment should probably be enough.
1
0
0