
Xuan-Son Nguyen
@ngxson
Followers
5K
Following
966
Media
195
Statuses
736
Real-time webcam demo with @huggingface SmolVLM and @ggml_org llama.cpp server. All running locally on a Macbook M3
208
2K
12K
@nazo_btw Debloat script works well on win 10, but does nothing on win 11 in my case. The problem is that whoever invent the UI on win 11 have no idea about low-level optimizations. It feels like microsoft just recuited bunch of web devs to work on win 11 UI.
11
9
698
My colleague @xenovacom also made a WebGPU version which can run 100% on browser, no localhost server is required!. Check it out:
Real-time webcam demo with @huggingface SmolVLM and @ggml_org llama.cpp server. All running locally on a Macbook M3
9
34
239
Had a fantastic chat today with @ggerganov , the brilliant mind behind ggml, llama.cpp, and whisper.cpp - tools we all know and love! We covered a lot, including:. π The integration of vision models into llama.cpp - still a work in progress, but weβre pushing hard to make it
10
12
234
I said let him cook π£οΈπ£οΈπ£οΈ. Real-time on-mobile caption with @pocketpal_ai , running 100% offline π. Tested on my poor iPhone SE 2. Huge kudos to @ghorbani_asghar for making this!!
10
24
172
Wondering how much RAM is needed to run a given GGUF?. Try: npx @huggingface/gguf [model].gguf. This also work with remote file, for example: npx @huggingface/gguf https://huggingface(.)co/bartowski/Qwen_QwQ-32B-GGUF/resolve/main/Qwen_QwQ-32B-Q4_K_M.gguf
10
22
147
Looking for a private way to use Deepseek-R1? (NOT the Distilled model). @huggingface got you covered! Deepseek-R1 is deployable via llama.cpp-powered inference endpoint. Thanks @UnslothAI for the GGUF quants!
10
17
123
Being on the same plane with folks at @huggingface give me a perfect excuse to show off why on-device LLM is so cool βοΈ. Running llama.cpp - a masterpiece of @ggerganov . Model is @AIatMeta Llama 3.1 8B
5
5
82
Wanna see something cool?. You can now deploy GGUF models directly onto Hugging Face Inference Endpoints!. Powered by llama.cpp @ggerganov . Try it now -->
4
15
71
We release yet another config to deploy DeepSeek-R1 on HF inference endpoints!. It may looks expensive, but you get 32K context length and a bigger, better quality quantization. Thanks @UnslothAI for providing the IQ2_XXS dynamic quant!
3
8
74
@MaziyarPanahi Haha yeah true, agents, MCP, etc are just algorithmic wrapper around matrix multiplications π.
1
0
43
@leoplusx @ggerganov @huggingface @ggml_org Basically when writing code, I usually start with a very minimal PoC, then scale it up. The calculator is very useful to double check manually if things are correct. For example, when posting llama 4 to llama.cpp, which was a very big model, I started experimenting with a tiny.
2
1
42
At @huggingface , we always try to make open-source AI become more and more accessible. And today, we drop a HUGE feature: Inference Provider. You can now try any LLMs on the hub, for FREE!
1
5
40
Got a brand new 32" screen today. Thatβs a huge boost for my productivity. Thanks @julien_c @huggingface π€π€
2
1
29
PocketPal AI v1.5 is released π―. You can now access to more than 45K GGUF models on the @huggingface Hub π€ , directly from the application!. As a bonus, we got a brand new logo for the app too!. Huge thanks to @ghorbani_asghar !
2
7
29
Hey @huggingface you don't need to insult me about being GPU poor π. Just kidding though, kudos to @huggingface frontend team for adding this feature π
1
4
28
@grantmweston @huggingface @ggml_org I'm using a very small model, SmolVLM 500M, which have decent speed even without a GPU.
1
1
27
Running any GGUF in ollama WITHOUT creating Modelfile?. Yes, it's now possible thanks to Hugging Face hub!. Another win for GGUF/ggml/llama.cpp eco-system, hurrah!! @ggerganov
2
6
23
We also support @huggingface SmolVLM series which delivers light-speed response thanks to its mini size!. This is perfect for real-time home video surveillance system. That's one of the ideas for my next hobby project!. llama-server -hf ggml-org/SmolVLM2-2.2B-Instruct-GGUF.
2
1
23
Adding support for Pixtral on llama.cpp, but I struggled to understand the 2D RoPE (how it is different from traditional RoPE on text model?). Can you help please π π€ @MistralAI @sophiamyang - Many thanks!!!.
4
2
23
Wanna try out the latest π Deepseek Janus Pro model?. Here is a demo made by @xenovacom that runs 100% on browser using WebGPU π€―
2
3
22
Had an amazing discussion with @bartowski1182 yesterday! We dove deep into quantization in llama.cpp and what the future holds. He was super chill, and I loved that!. Excited about the possibility of collaborating soon - stay tuned for what weβre cooking! π₯π
3
1
21
Introducing: the brand new Cortex app from @homebrewltd !!. With Cortex, you can run GGUF models locally, privately!. Powered by llama.cpp under the hood - you have access to more than 45K GGUF models on @huggingface hub. Try it now -->
0
5
20
Let LM Studio cook π₯. Ofc cooking with the main ingredients - llama.cpp from @ggml_org ππ.
Vision LLM libmtmd-ception: we've adopted llama.cpp's new libmtmd in @lmstudio! . You can now run Pixtral, SmolVLM, InternVL3 and more - 100% locally. Here's Pixtral telling me about @ngxson's viral tweet demoing the new llama.cpp tech π₯
1
2
19
Flashback to the other day at @huggingface Paris HQ, I'm playing Canon in C++, not Canon in D π
1
0
18
@im_roy_lee @InterviewCoder I imagine a future where the candidate's webcam and voice stream are AI-generated, someone will be able to pass the interview without even participate in the interview π.
0
0
14
@ClementDelangue @ggerganov Thanks @ClementDelangue for resharing about this! π». Here is the link to the documentation if you want to give it a try:
0
3
14
I was thinking about how to try the ChatGPT voice mode without paying monthly subscription. Here we go!.
You can now talk to ChatGPT by calling in the U.S. or by sending a WhatsApp message. You can also talk to chatgpt right now on anychat. here is chatgpt gpt-4o-mini-realtime-preview-2024-12-17 talking to chatgpt advance voice mode
0
1
11
We extend the list to support yet another family of vision model: InternVL 3 and InternVL 2.5 π. Pinging my friend who work on Vietnamese vision model, @dtkhangbkg
1
2
12
On Monday, the 24th, I'm proud to give a talk at @SotaFamily 's webinar. My main talk will last for an hour to deep dive into the current state of on-device LLMs, exploring their advantages, performance trade-offs, and limitations. The session will conclude with an open Q&A,
1
0
11
While writing my new blog post about vision stuff, I was surprise to see Copilot correctly suggested about @ggerganov even though my current document has no mention about him!. Seems like llama.cpp has became a "general" knowledge π
0
0
11
End of one-week team building with @huggingface team. So many unforgettable moments. Time & energy well spent π€
0
0
11
@Akumunokokoro Haven't tested, but I think ChatGPT / Claude may be able to produce an acceptable result. The problems are:.- I will need to prompt it more times.- Input length are extremely limited on these platforms.So, without deepseek, I would prefer doing myself.
1
0
11
Shout out to @ggerganov and the team for supporting me finalize this feature! Thanks to @huggingface for providing all the necessary hardware during the development process π€.
0
0
11
(2/3) Collaboration is the key! Special thanks to @art_zucker who wrote the transformers implementation, which allows me to generate random weight, while also serves as a reference point for the cpp code. Also big kudos to @ggerganov who spent time to test and review my PR!
1
0
11
Spent the whole day and still haven't got llama 4 vision to work. At this point, I'm pretty sure that @MistralAI's Mistral Small is the best model (with vision support) that you can find. And btw, llama.cpp has some optimizations that ollama doesn't have π.
2
0
10
@reach_vb @novita_labs We have inference provider support even before model card is up πππππ.
0
0
10
CC @ggerganov you may want to give this a try! I coded a super small JSON parser to make this work, kinda fun π.
1
0
10