For the native Greek speakers, you can already interact with Meltemi on your laptop directly from HF using MLX. I also uploaded a quantized 4-bit version on mlx-community for faster inference. Almost 20 tokens per second on a MacBook Air and 90 on an M2 Ultra! Tweet added by Angelos Katharopoulos @angeloskath

Angelos Katharopoulos

3 months

For the native Greek speakers, you can already interact with Meltemi on your laptop directly from HF using MLX. I also uploaded a quantized 4-bit version on mlx-community for faster inference. Almost 20 tokens per second on a MacBook Air and 90 on an M2 Ultra!

Ion Androutsopoulos

@ionandrou

3 months

5

18

65

2

9

41

Angelos Katharopoulos

@angeloskath

3 months

To reproduce the video above, first pip install -U mlx_lm and then python -m mlx_lm.generate \ --model mlx-community/ilsp-Meltemi-7B-Instruct-v1-4bit \ --prompt "Πες μου την ιστορία της Ελλάδας σε μία παράγραφο." \ --temp 0.0 --max-tokens 2048 on any M-series Mac.

1

10

Angelos Katharopoulos

@angeloskath

3 months

Congrats to all the researchers from ILSP and Athena research center that worked on this, I couldn't find twitter handles to tag people so please let me know if I should be tagging someone.

2

5

K

@walkfourmore

3 months

@angeloskath What am I doing wrong? I run the q4 version on a base MacBook Air M1 and inference runs at 2 token / min or smth like that.

1

0

Angelos Katharopoulos

@angeloskath

3 months

@walkfourmore Hmm just ran it on an 8GB M1 Mac mini (same chip) and it gets a very respectable 12 tps. Feel free to file an issue on GitHub with details to help you debug your setup. Otherwise doing a fresh install on a new python environment should probably be enough.

1

0

Replies