Explore tweets tagged as #llama_cpp
Exposed LLM server queries for @shodanhq . port:11434 "Ollama".port:8000 "vLLM".port:8000 "llama.cpp".port:8080 "llama.cpp".port:1234 "LM Studio".port:4891 "GPT4All".port:8000 "LangChain".
4
59
310
This is wild. A real-time webcam demo using SmolVLM from @huggingface and llama.cpp! 🤯. Running fully local on a MacBook M3.
68
517
4K
Llama.cpp is now faster with Qwen3-30B-A3B-Thinking-2507 on Metal! Much faster! 🔥. I will retest on 3090 TI with Cuda and publish updated benchmarks!. Thanks @ggerganov for the new release!
14
27
367
One of the quickest ways to start playing with a good local LLM on macOS (if you have ~12GB of free disk space and RAM) - using llama-server and gpt-oss-20b:. brew install llama.cpp.llama-server -hf ggml-org/gpt-oss-20b-GGUF \. --ctx-size 0 --jinja -ub 2048 -b 2048 -ngl 99 -fa
The ultimate guide for using gpt-oss with llama.cpp . - Runs on any device.- Supports NVIDIA, Apple, AMD and others.- Support for efficient CPU offloading.- The most lightweight inference stack today.
23
72
785