I'm happy to share the release of gemma.cpp - a lightweight, standalone C++ inference engine for Google's Gemma models:
Have to say, it’s one of the best project experiences of my career.
gemma.cpp is a minimalist implementation of Gemma 2B and 7B models:
focusing on simplicity and directness rather than full generality, it takes inspiration from ggml, llama.c, and other "integrated" model implementations.
The goal of the project is to have a small experimental inference engine for experimentation and research.
The codebase has minimal dependencies and is portable pure C++ (taking advantage of for portable SIMD).
The core implementation is ~ 2K LOC,w/ ~ 4K LOC supporting code. It’s meant to be both hackable and also embeddable as a library w/ cmake.
Prototype your apps with local LLM inference as a C++ function call. Add runtime support for your own research with a few lines of code.
Beyond the interactive terminal ui for playing with the model, with near-instant model loading we can use gemma as a local-first command line LLM tool.
Jan Wassenberg (author of ) and I started gemma.cpp as a small project just a few months ago.
We were lucky to find amazing collaborators from around Google -
@PhilCulliton
,
@dancherp
, Paul Chang, and of course, the GDM Gemma team.
What's next? There’s a lot of low-hanging fruit - we welcome external collaborators .
I'm most excited to enable new research on co-design between models + inference engines. Stay tuned.
“Now that things are so simple, there's so much to do.” - M. Feldman
@yvyuz
Would be happy to work together in some form,
@ggerganov
has done a lot for open models.
There was already a patch to get Gemma into llama.cpp pretty early today:
@ramkumarkoppu
We're starting with portable cpu simd as a common denominator, accelerator support is an important priority next. Happy to have collaborators join on this.
@austinvhuang
Developers can now access an open-source LLM that packs the advanced capabilities and performance of models like Gemini and LLaMA into a compact model optimized for mainstream use.