Explore tweets tagged as #llama_cpp
Analysis and exploitation of a heap buffer overflow in Llama.cpp https://t.co/iUswoSCEhr
#cybersecurity #llama
1
30
280
Released a GPU-accelerated llama-cpp-python 0.3.16 wheel for Python 3.14. Built with CUDA 13.1 — supports full layer offload and tested at ~85 tokens/second on Llama 3 8B Q4_K_M (RTX 3090). Runtime requires only NVIDIA drivers; no toolkit needed. https://t.co/UCjb25zGqm
0
0
0
I built nano-llama.cpp - a minimal 3k lines implementation where i reverse engineered llama.cpp from the older commits to understand the core features. it has - 1. how to convert llama checkpoint to a basic ggml binary file. 2. Q4_0 quantization: Implements block-wise 4-bit
18
28
378
termuxでollama使ってたけど使えるモデルが少ないので調べたら一般的なLLMはllama.cppがベースということで、こっちも使えるし軽いってことなんでollamaアウトllama.cppイン。WEBUIと言うローカルサーバーつけたらチャットUI出てくるの便利。
1
0
0
【ローカルLLM入門】「GGUF」って何? llama.cpp でおなじみの標準規格「GGUF」についておさらい。 この謎の言葉を見かけたら、こう思ってください。 GGUF=AIのゲームカセット
0
0
36
In case you missed it - llama.cpp now supports Live Model Switching
2
3
9
Local AI just unlocked a feature that cloud providers can't match. llama.cpp now supports live model switching. No restart. No reload. Instant. This changes everything. 🧵
1
0
2
💡 Big Llama.CPP News: Llama.cpp’s new model router turns a single‑model local LLM server into a stable, multi‑model, OpenAI‑API‑compatible platform that can dynamically load, switch, and evict models without restarts, something older setups could not do efficiently or safely.
0
1
0
llama.cpp 终于要砸 Ollama 饭碗了! llama.cpp 是一个C++实现的大模型推理引擎,而ollama是在llama.cpp基础上套了个网页界面。当然 llama.cpp 之前也是有网页界面的,不过做得很简陋。但是今天迎来了大更新,给大家捋一捋:
8
63
392
LLMlet: P2P distributed LLM inference on browsers with Wasm-compiled llama.cpp + WebRTC Repo: https://t.co/v0pJciWWxt Demo: https://t.co/Zq3jCj7fMa A model can't fit in a tab can be split and run on multiple browsers. Still experimental and missing parallelism and TURN service.
2
3
12
Nvidia Thor Llama.cpp Nemotron-Nano 17.5 tokens per second.
0
0
0
🚀 Transformers v5 is OUT and it’s a full ecosystem rebirth Some key features: > Unified tokenizer stack (simpler, faster, no Fast/Slow confusion) > First-class quantization for efficient training + inference > Seamless interoperability across libraries (MLX, llama.cpp, ONNX,
1
4
31
In collaboration with NVIDIA, the new Nemotron 3 Nano model is fully supported in llama.cpp Nemotron 3 Nano features an efficient hybrid, Mamba, MoE architecture. It's a promising model, suitable for local AI applications on mid-range hardware. The large context window makes it
8
44
398