@austinvhuang
Austin Huang
4 months
I'm happy to share the release of gemma.cpp - a lightweight, standalone C++ inference engine for Google's Gemma models: Have to say, it’s one of the best project experiences of my career.
22
198
1K

Replies

@austinvhuang
Austin Huang
4 months
gemma.cpp is a minimalist implementation of Gemma 2B and 7B models: focusing on simplicity and directness rather than full generality, it takes inspiration from ggml, llama.c, and other "integrated" model implementations.
1
3
30
@austinvhuang
Austin Huang
4 months
The goal of the project is to have a small experimental inference engine for experimentation and research. The codebase has minimal dependencies and is portable pure C++ (taking advantage of for portable SIMD).
1
1
29
@austinvhuang
Austin Huang
4 months
The core implementation is ~ 2K LOC,w/ ~ 4K LOC supporting code. It’s meant to be both hackable and also embeddable as a library w/ cmake. Prototype your apps with local LLM inference as a C++ function call. Add runtime support for your own research with a few lines of code.
1
1
20
@austinvhuang
Austin Huang
4 months
Beyond the interactive terminal ui for playing with the model, with near-instant model loading we can use gemma as a local-first command line LLM tool.
1
2
23
@austinvhuang
Austin Huang
4 months
Jan Wassenberg (author of ) and I started gemma.cpp as a small project just a few months ago. We were lucky to find amazing collaborators from around Google - @PhilCulliton , @dancherp , Paul Chang, and of course, the GDM Gemma team.
1
1
19
@austinvhuang
Austin Huang
4 months
What's next? There’s a lot of low-hanging fruit - we welcome external collaborators . I'm most excited to enable new research on co-design between models + inference engines. Stay tuned. “Now that things are so simple, there's so much to do.” - M. Feldman
0
2
36
@yvyuz
Yuriy Yuzifovich
4 months
@austinvhuang This is great! Any plans to contribute to llamacpp?
1
0
0
@austinvhuang
Austin Huang
4 months
@yvyuz Would be happy to work together in some form, @ggerganov has done a lot for open models. There was already a patch to get Gemma into llama.cpp pretty early today:
@ggerganov
Georgi Gerganov
4 months
Run @Google 's Gemma Open Models with llama.cpp
22
79
645
0
0
2
@AnthonyGM_g
toni
4 months
@austinvhuang this is great..working on gemma.mojo
2
0
7
@austinvhuang
Austin Huang
4 months
0
0
0
@ramkumarkoppu
Ram Koppu
4 months
@austinvhuang Does it utilizes neural accelerator if I run it on coral micro board?
1
0
2
@austinvhuang
Austin Huang
4 months
@ramkumarkoppu We're starting with portable cpu simd as a common denominator, accelerator support is an important priority next. Happy to have collaborators join on this.
0
0
4
@ChiefScientist
Alexy 🤍💙🤍
4 months
@austinvhuang I knew you’re behind it! Where is Haskemma?!:)
1
0
0
@austinvhuang
Austin Huang
4 months
@ChiefScientist FP friends - gemma.dex please ;) 2k loc in c++ probably turns into 200 loc in Dex.
2
0
4
@_willfalcon
William Falcon ⚡️
4 months
@austinvhuang congratulations! this is super exciting. can’t wait to try it
0
0
1
@mitch_7w
Mitch🦖
4 months
@austinvhuang so damn cool
0
0
1
@AiDeeply
AI Deeply
4 months
@austinvhuang Looks useful. And fully open: "Apache-2.0, BSD-3-Clause licenses found"
0
0
0
@rbhar90
Bharath Ramsundar
4 months
@austinvhuang Super cool!
0
0
1
@The_AI_Edge
The AI Edge
4 months
@austinvhuang Developers can now access an open-source LLM that packs the advanced capabilities and performance of models like Gemini and LLaMA into a compact model optimized for mainstream use.
0
0
0
@wtsnz
Will Townsend
4 months
@austinvhuang Nice work!
0
0
1