Explore tweets tagged as #llama_cpp
@0xor0ne
0xor0ne
13 days
Analysis and exploitation of a heap buffer overflow in Llama.cpp https://t.co/iUswoSCEhr #cybersecurity #llama
1
30
280
@aivrar
aivrar
13 minutes
Released a GPU-accelerated llama-cpp-python 0.3.16 wheel for Python 3.14. Built with CUDA 13.1 — supports full layer offload and tested at ~85 tokens/second on Llama 3 8B Q4_K_M (RTX 3090). Runtime requires only NVIDIA drivers; no toolkit needed. https://t.co/UCjb25zGqm
0
0
0
@ggerganov
Georgi Gerganov
15 days
The new Mistral 3 models in llama.cpp
14
23
365
@jino_rohit
Jino Rohit
1 month
I built nano-llama.cpp - a minimal 3k lines implementation where i reverse engineered llama.cpp from the older commits to understand the core features. it has - 1. how to convert llama checkpoint to a basic ggml binary file. 2. Q4_0 quantization: Implements block-wise 4-bit
18
28
378
@ngxson
Xuan-Son Nguyen
1 day
GLM-4.x Vision is now supported by llama.cpp
2
0
10
@pipisha
ぴーぴしゃ
10 hours
termuxでollama使ってたけど使えるモデルが少ないので調べたら一般的なLLMはllama.cppがベースということで、こっちも使えるし軽いってことなんでollamaアウトllama.cppイン。WEBUIと言うローカルサーバーつけたらチャットUI出てくるの便利。
1
0
0
@PopLink_jp
PopLink【公式】
22 days
【ローカルLLM入門】「GGUF」って何? llama.cpp でおなじみの標準規格「GGUF」についておさらい。 この謎の言葉を見かけたら、こう思ってください。 GGUF=AIのゲームカセット
0
0
36
@ngxson
Xuan-Son Nguyen
6 days
In case you missed it - llama.cpp now supports Live Model Switching
2
3
9
@WMjjRpISUEt2QZZ
ぱぷりか炒め
1 month
plamo3のllama.cpp対応なんとか。
1
5
30
@_m0se_
OpenMOSE
26 days
すばらしい Qwen3-VL-235B -> 魔改造REAP -> 145B化 llama.cpp gguf変換 うごいていそうだ
0
0
13
@theflawdbeing
S Ankit Gupta
3 days
Local AI just unlocked a feature that cloud providers can't match. llama.cpp now supports live model switching. No restart. No reload. Instant. This changes everything. 🧵
1
0
2
@victormustar
Victor M
7 days
llama.cpp gets a new CLI (tested it and it's 🔥)
6
17
193
@TeksEdge
David Hendrickson
1 day
💡 Big Llama.CPP News: Llama.cpp’s new model router turns a single‑model local LLM server into a stable, multi‑model, OpenAI‑API‑compatible platform that can dynamically load, switch, and evict models without restarts, something older setups could not do efficiently or safely.
0
1
0
@posi_posi8
posi_posi
16 days
llama.cppを��ップデートしたら、ついにLM StudioでQwen3-Next-80b-a3b-instructが使えた! 忖度なしで、化学業界の活路を回答してもらった。
@posi_posi8
posi_posi
16 days
llama.cppがQwen3-Nextに対応。 試してみよう。
1
3
11
@karminski3
karminski-牙医
1 month
llama.cpp 终于要砸 Ollama 饭碗了! llama.cpp 是一个C++实现的大模型推理引擎,而ollama是在llama.cpp基础上套了个网页界面。当然 llama.cpp 之前也是有网页界面的,不过做得很简陋。但是今天迎来了大更新,给大家捋一捋:
8
63
392
@TokunagaKohei
Kohei Tokunaga
23 days
LLMlet: P2P distributed LLM inference on browsers with Wasm-compiled llama.cpp + WebRTC Repo: https://t.co/v0pJciWWxt Demo: https://t.co/Zq3jCj7fMa A model can't fit in a tab can be split and run on multiple browsers. Still experimental and missing parallelism and TURN service.
2
3
12
@Matthewrogers
Matthew Rogers
10 hours
Nvidia Thor Llama.cpp Nemotron-Nano 17.5 tokens per second.
0
0
0
@Tu7uruu
steven
15 days
🚀 Transformers v5 is OUT and it’s a full ecosystem rebirth Some key features: > Unified tokenizer stack (simpler, faster, no Fast/Slow confusion) > First-class quantization for efficient training + inference > Seamless interoperability across libraries (MLX, llama.cpp, ONNX,
1
4
31
@ggerganov
Georgi Gerganov
2 days
In collaboration with NVIDIA, the new Nemotron 3 Nano model is fully supported in llama.cpp Nemotron 3 Nano features an efficient hybrid, Mamba, MoE architecture. It's a promising model, suitable for local AI applications on mid-range hardware. The large context window makes it
8
44
398