For the native Greek speakers, you can already interact with Meltemi on your laptop directly from HF using MLX.
I also uploaded a quantized 4-bit version on mlx-community for faster inference. Almost 20 tokens per second on a MacBook Air and 90 on an M2 Ultra!
To reproduce the video above, first
pip install -U mlx_lm
and then
python -m mlx_lm.generate \
--model mlx-community/ilsp-Meltemi-7B-Instruct-v1-4bit \
--prompt "Πες μου την ιστορία της Ελλάδας σε μία παράγραφο." \
--temp 0.0 --max-tokens 2048
on any M-series Mac.
Congrats to all the researchers from ILSP and Athena research center that worked on this, I couldn't find twitter handles to tag people so please let me know if I should be tagging someone.
@walkfourmore
Hmm just ran it on an 8GB M1 Mac mini (same chip) and it gets a very respectable 12 tps.
Feel free to file an issue on GitHub with details to help you debug your setup. Otherwise doing a fresh install on a new python environment should probably be enough.