Georgi Gerganov
@ggerganov
Followers
53K
Following
3K
Media
292
Statuses
2K
24th at the Electrica puzzle challenge | https://t.co/baTQS2bdia
Joined May 2015
New account for ggml news and notable PRs
11
30
268
In collaboration with NVIDIA, the new Nemotron 3 Nano model is fully supported in llama.cpp Nemotron 3 Nano features an efficient hybrid, Mamba, MoE architecture. It's a promising model, suitable for local AI applications on mid-range hardware. The large context window makes it
developer.nvidia.com
Agentic AI systems increasingly rely on collections of cooperating agents—retrievers, planners, tool executors, verifiers—working together across large contexts and long time spans.
8
43
405
Some neat QoL improvements coming to llama.cpp thanks to Johannes Gäßler https://t.co/UDhoJo6Zzj
github.com
CPU + GPU hybrid inference has been a core feature of llama.cpp since early on, and I would argue, one of the major selling points vs. projects like ExLlama. The way to control memory use until now...
5
10
120
Introducing: the new llama-cli 🦙🦙 > Clean looking interface > Multimodal support > Conversation control via commands > Speculative decoding support > Jinja fully supported
2
23
116
We joined forces with NVIDIA to unlock high-speed AI inference on RTX AI PCs and DGX Spark using llama.cpp. The latest Ministral-3B models reach 385+ tok/s on @NVIDIA_AI_PC GeForce RTX 5090 systems. Blog:
developer.nvidia.com
The new Mistral 3 open model family delivers industry-leading accuracy, efficiency, and customization capabilities for developers and enterprises. Optimized from NVIDIA GB200 NVL72 to edge platforms…
16
42
427
Transformers v5's first release candidate is out 🔥 The biggest release of my life. It's been five years since the last major (v4). From 20 architectures to 400, 20k daily downloads to 3 million. The release is huge, w/ tokenization (no slow tokenizers!), modeling & processing.
20
89
577
WIP: using multiple models at the same time with llama-server 🦙
3
2
22
Just tried out the new built-in WebUI feature of llama.cpp and it couldn't be easier. Just start llama-server with a host and port, and voila!
15
11
163
Initial M5 Neural Accelerators support in llama.cpp Enjoy faster TTFT in all ggml-based software (requires macOS Tahoe 26) https://t.co/HWbCQvFR2w
github.com
Rework matrix-matrix multiplication Use Tensor API when available TODOs Update mul_mm_id kernel Test on M5 (looking for volunteers to test as I won't have hardware anytime soon) How to...
10
35
370
Initial M5 Neural Accelerators support in llama.cpp Enjoy faster TTFT in all ggml-based software (requires macOS Tahoe 26) https://t.co/HWbCQvFR2w
github.com
Rework matrix-matrix multiplication Use Tensor API when available TODOs Update mul_mm_id kernel Test on M5 (looking for volunteers to test as I won't have hardware anytime soon) How to...
10
35
370
@fishright @ggerganov Just pushed a fix for this — this is what first launch is going to look like in the next version.
1
1
10
LlamaBarn v0.10.0 (beta) is out - feedback appreciated
16
15
214
When you run AI on your device, it is more efficient and less big brother and free! So it's very cool to see the new llama.cpp UI, a chatgpt-like app that fully runs on your laptop without needing wifi or sending any data external to any API. It supports: - 150,000+ GGUF models
51
175
2K
The new WebUI in combination with the advanced backend capabilities of llama.cpp delivers the ultimate local AI chat experience It's fast, private, free and open-source It runs on any hardware - today Huge thanks to the team at @huggingface for initiating, leading and
github.com
Overview This guide highlights the key features of the new SvelteKit-based WebUI of llama.cpp. The new WebUI in combination with the advanced backend capabilities of the llama-server delivers the u...
6
17
149