ngxson Profile Banner
Xuan-Son Nguyen Profile
Xuan-Son Nguyen

@ngxson

Followers
5K
Following
966
Media
195
Statuses
736

Engineer @huggingface

Joined August 2020
Don't wanna be here? Send us removal request.
@ngxson
Xuan-Son Nguyen
1 month
Introducing: The most visually intuitive article about RoPE, 2D-RoPE, and M-RoPE that you can find on the internet πŸ˜†. Link in 🧡
Tweet media one
6
38
302
@ngxson
Xuan-Son Nguyen
15 days
Real-time webcam demo with @huggingface SmolVLM and @ggml_org llama.cpp server. All running locally on a Macbook M3
208
2K
12K
@ngxson
Xuan-Son Nguyen
4 months
Prompting DeepSeek-R1, asking it to optimize your code and getting x2 performance boost 🀯 What a time to be alive!
Tweet media one
Tweet media two
13
81
768
@ngxson
Xuan-Son Nguyen
16 days
@nazo_btw Debloat script works well on win 10, but does nothing on win 11 in my case. The problem is that whoever invent the UI on win 11 have no idea about low-level optimizations. It feels like microsoft just recuited bunch of web devs to work on win 11 UI.
11
9
698
@ngxson
Xuan-Son Nguyen
18 days
Vision support now available on llama.cpp server and Web UI!. More details in 🧡
Tweet media one
16
83
684
@ngxson
Xuan-Son Nguyen
15 days
@huggingface @ggml_org Check it out:
8
76
595
@ngxson
Xuan-Son Nguyen
13 days
Wow we're now running OCR in real-time, 100% on-browser via WebGPU πŸš€.
@andimarafioti
Andi Marafioti
14 days
Real-time SmolVLM in a web-browser with transformers.js. All running locally with no installs. Just open the website.
5
44
434
@ngxson
Xuan-Son Nguyen
3 months
Day-zero Gemma 3 support in llama.cpp 🀯. πŸ‘‰ 4 model sizes: 1B, 4B, 12B, 27B.πŸ‘‰ Vision capability (except for 1B) with bi-direction attention.πŸ‘‰ Context size: 32k (1B) and 128k (4B, 12B, 27B).πŸ‘‰ +140 languages support (except for 1B).πŸ‘‰ Day-zero support on many frameworks πŸš€
Tweet media one
9
50
305
@ngxson
Xuan-Son Nguyen
15 days
Plot twist: the whole demo is vibe-coded
Tweet media one
5
13
279
@ngxson
Xuan-Son Nguyen
15 days
My colleague @xenovacom also made a WebGPU version which can run 100% on browser, no localhost server is required!. Check it out:
@ngxson
Xuan-Son Nguyen
15 days
Real-time webcam demo with @huggingface SmolVLM and @ggml_org llama.cpp server. All running locally on a Macbook M3
9
34
239
@ngxson
Xuan-Son Nguyen
2 months
Had a fantastic chat today with @ggerganov , the brilliant mind behind ggml, llama.cpp, and whisper.cpp - tools we all know and love! We covered a lot, including:. πŸš€ The integration of vision models into llama.cpp - still a work in progress, but we’re pushing hard to make it
Tweet media one
10
12
234
@ngxson
Xuan-Son Nguyen
5 months
Meet MiniThinky, the ultimate AI powerhouse! With only 1 billion parameters, MiniThinky delivers rapid insights and accurate solutions. Perfect for tackling complex problems swiftly!
6
31
192
@ngxson
Xuan-Son Nguyen
12 days
I said let him cook πŸ—£οΈπŸ—£οΈπŸ—£οΈ. Real-time on-mobile caption with @pocketpal_ai , running 100% offline πŸš€. Tested on my poor iPhone SE 2. Huge kudos to @ghorbani_asghar for making this!!
10
24
172
@ngxson
Xuan-Son Nguyen
4 months
An experimental python interpreter has just arrived on llama.cpp server's Web UI
Tweet media one
8
10
161
@ngxson
Xuan-Son Nguyen
3 months
Wondering how much RAM is needed to run a given GGUF?. Try: npx @huggingface/gguf [model].gguf. This also work with remote file, for example: npx @huggingface/gguf https://huggingface(.)co/bartowski/Qwen_QwQ-32B-GGUF/resolve/main/Qwen_QwQ-32B-Q4_K_M.gguf
Tweet media one
10
22
147
@ngxson
Xuan-Son Nguyen
4 months
Looking for a private way to use Deepseek-R1? (NOT the Distilled model). @huggingface got you covered! Deepseek-R1 is deployable via llama.cpp-powered inference endpoint. Thanks @UnslothAI for the GGUF quants!
Tweet media one
10
17
123
@ngxson
Xuan-Son Nguyen
14 days
Firefox is open-source on Github, and they experimented with @ggml_org llama.cpp in WASM πŸ‘€. Wondering what they are cooking πŸ§‘β€πŸ³
Tweet media one
4
9
97
@ngxson
Xuan-Son Nguyen
2 months
My 2-day work: Llama 4 on llama.cpp - on the horizon!. I had more fun doing this than I initially expected :laughing:. What I learnt while working on this? Follow in :thread:
Tweet media one
6
10
94
@ngxson
Xuan-Son Nguyen
6 months
Hugging Face inference endpoints now support CPU deployment for llama.cpp πŸš€ πŸš€. Why this is a huge deal? Llama.cpp is well-known for running very well on CPU. If you're running small models like Llama 1B or embedding models, this will definitely save tons of money πŸ’° πŸ’°
3
22
89
@ngxson
Xuan-Son Nguyen
7 months
Being on the same plane with folks at @huggingface give me a perfect excuse to show off why on-device LLM is so cool ✈️. Running llama.cpp - a masterpiece of @ggerganov . Model is @AIatMeta Llama 3.1 8B
5
5
82
@ngxson
Xuan-Son Nguyen
3 months
Aya Vision is now the number one trending OCR model on Hugging Face πŸš€ . πŸ‘‰ Comes in 2 sizes, 8B and 32B.πŸ‘‰ Supports 32 languages.πŸ‘‰ Day-zero support with HF Transformers
Tweet media one
4
12
80
@ngxson
Xuan-Son Nguyen
2 months
Gemma 3 VISION on llama.cpp server. Still very early WIP, but it works πŸ”₯
Tweet media one
4
3
78
@ngxson
Xuan-Son Nguyen
8 months
Wanna see something cool?. You can now deploy GGUF models directly onto Hugging Face Inference Endpoints!. Powered by llama.cpp @ggerganov . Try it now -->
4
15
71
@ngxson
Xuan-Son Nguyen
4 months
We release yet another config to deploy DeepSeek-R1 on HF inference endpoints!. It may looks expensive, but you get 32K context length and a bigger, better quality quantization. Thanks @UnslothAI for providing the IQ2_XXS dynamic quant!
Tweet media one
3
8
74
@ngxson
Xuan-Son Nguyen
2 months
Cooking a fun thing today, I can now load safetensors file directly to GGML without having to convert it to GGUF!. Why? Because this allow me to do experiments faster, especially with models outside of llama.cpp πŸ˜†
Tweet media one
6
16
74
@ngxson
Xuan-Son Nguyen
2 months
llama.cpp multimodal roadmap πŸ”₯
Tweet media one
3
8
69
@ngxson
Xuan-Son Nguyen
4 months
How to make DeepSeek-R1-Qwen **abliterated** ?. llama-cli \.-m DeepSeek-R1-Distill-Qwen-7B/model.gguf \.--lora LoRA-Qwen2.5-7B-Instruct-abliterated-v3-f16.gguf.
4
4
51
@ngxson
Xuan-Son Nguyen
4 months
The PR is here if you're curious:
1
1
49
@ngxson
Xuan-Son Nguyen
1 month
@MaziyarPanahi Haha yeah true, agents, MCP, etc are just algorithmic wrapper around matrix multiplications πŸ˜‚.
1
0
43
@ngxson
Xuan-Son Nguyen
7 months
The Ollama - Hugging Face integration has been rolled out for 1 week now, and how it’s going?. Obviously, pretty well! We’re having on average 4500 pulls per day. That’s about one pull every 20 seconds!
Tweet media one
4
8
40
@ngxson
Xuan-Son Nguyen
15 days
@leoplusx @ggerganov @huggingface @ggml_org Basically when writing code, I usually start with a very minimal PoC, then scale it up. The calculator is very useful to double check manually if things are correct. For example, when posting llama 4 to llama.cpp, which was a very big model, I started experimenting with a tiny.
2
1
42
@ngxson
Xuan-Son Nguyen
7 months
Beautiful view from the new @huggingface Paris HQ
Tweet media one
2
1
40
@ngxson
Xuan-Son Nguyen
4 months
At @huggingface , we always try to make open-source AI become more and more accessible. And today, we drop a HUGE feature: Inference Provider. You can now try any LLMs on the hub, for FREE!
1
5
40
@ngxson
Xuan-Son Nguyen
4 months
Interesting, DeepSeek mitigates DDOS attack by doing a challenge-response on-browser, asking it to calculate SHA3 of a random string using wasm module. Image:.(1) the challenge.(2) extract from wasm byte code.(3) the handler code in javascript
Tweet media one
Tweet media two
Tweet media three
1
7
35
@ngxson
Xuan-Son Nguyen
8 months
How to create a new llama.cpp container on Hugging Face Inference Endpoints?. Have a look on our step-by-step guide -->
Tweet media one
0
7
32
@ngxson
Xuan-Son Nguyen
15 days
@czahgu Polt twist 2 I guess? To make this happen, I wrote the whole llama.cpp vision part (C++ code) which is thousands line of code πŸ˜†πŸ˜†.
3
0
38
@ngxson
Xuan-Son Nguyen
7 months
How to use PEFT LoRA adapters in llama.cpp you may ask?. Introducing GGUF-my-LoRA, a brand new space that helps you to do just that!
Tweet media one
1
6
34
@ngxson
Xuan-Son Nguyen
18 days
If you are using brew, install the latest version via --HEAD and to enable this! (The build will be updated in the next few hours). We also have bunch of pre-quantized model, ready to be used. Have a look on this doc:
3
9
33
@ngxson
Xuan-Son Nguyen
9 months
Got a brand new 32" screen today. That’s a huge boost for my productivity. Thanks @julien_c @huggingface πŸ€—πŸ€—
Tweet media one
2
1
29
@ngxson
Xuan-Son Nguyen
6 months
PocketPal AI v1.5 is released πŸ’―. You can now access to more than 45K GGUF models on the @huggingface Hub πŸ€— , directly from the application!. As a bonus, we got a brand new logo for the app too!. Huge thanks to @ghorbani_asghar !
2
7
29
@ngxson
Xuan-Son Nguyen
1 month
Finally have time to write a blog post about ggml-easy! πŸ˜‚ . ggml-easy is a header-only wrapper for GGML, simplifies development with a cleaner API, easy debugging utilities, and native safetensors loading ✨ Great for rapid prototyping!
Tweet media one
1
5
30
@ngxson
Xuan-Son Nguyen
2 months
Mistral Small GGUF? Soon!.
1
2
30
@ngxson
Xuan-Son Nguyen
2 months
Hey @huggingface you don't need to insult me about being GPU poor πŸ˜†. Just kidding though, kudos to @huggingface frontend team for adding this feature πŸš€
Tweet media one
1
4
28
@ngxson
Xuan-Son Nguyen
3 months
Someone tell me, what GPT-4.5 can do that I can't replicate with other models via prompt engineering?.
9
4
28
@ngxson
Xuan-Son Nguyen
15 days
@grantmweston @huggingface @ggml_org I'm using a very small model, SmolVLM 500M, which have decent speed even without a GPU.
1
1
27
@ngxson
Xuan-Son Nguyen
1 month
Google having a quite good sense of humor πŸ˜‚ . Joke aside, 1B model quantized to Q4 without performance degrading is sweet 🀏
Tweet media one
4
0
26
@ngxson
Xuan-Son Nguyen
1 month
Link to article:
0
6
26
@ngxson
Xuan-Son Nguyen
1 month
@ggerganov Working with vision models can be quite fun πŸ˜‚πŸ˜‚
Tweet media one
0
0
25
@ngxson
Xuan-Son Nguyen
1 month
Telling LLM memory requirement WITHOUT a calculator?. Just use your good old human brain 🧠 😎 . Check out my 3‑step estimation πŸš€
Tweet media one
3
0
24
@ngxson
Xuan-Son Nguyen
7 months
Running any GGUF in ollama WITHOUT creating Modelfile?. Yes, it's now possible thanks to Hugging Face hub!. Another win for GGUF/ggml/llama.cpp eco-system, hurrah!! @ggerganov
Tweet media one
2
6
23
@ngxson
Xuan-Son Nguyen
16 days
@nazo_btw To be fair, because I no longer use windows on bare metal anymore (I moved to VM), so win 11 never feels smooth to me, even with debloat script. But what's not fair is that win 10 can get pretty acceptable frame rate on even poor VM config.
1
0
25
@ngxson
Xuan-Son Nguyen
18 days
Someone even made a blog post about this, very cool. Check this out -->
2
2
23
@ngxson
Xuan-Son Nguyen
3 months
With the new 🐸 JFrog 's model scanner on the πŸ€— Hugging Face hub, we're making running AI models even more secured for everyone!
Tweet media one
3
5
22
@ngxson
Xuan-Son Nguyen
18 days
We also support @huggingface SmolVLM series which delivers light-speed response thanks to its mini size!. This is perfect for real-time home video surveillance system. That's one of the ideas for my next hobby project!. llama-server -hf ggml-org/SmolVLM2-2.2B-Instruct-GGUF.
2
1
23
@ngxson
Xuan-Son Nguyen
1 month
Adding support for Pixtral on llama.cpp, but I struggled to understand the 2D RoPE (how it is different from traditional RoPE on text model?). Can you help please πŸ™ πŸ€— @MistralAI @sophiamyang - Many thanks!!!.
4
2
23
@ngxson
Xuan-Son Nguyen
4 months
Wanna try out the latest πŸ‹ Deepseek Janus Pro model?. Here is a demo made by @xenovacom that runs 100% on browser using WebGPU 🀯
2
3
22
@ngxson
Xuan-Son Nguyen
2 months
Had an amazing discussion with @bartowski1182 yesterday! We dove deep into quantization in llama.cpp and what the future holds. He was super chill, and I loved that!. Excited about the possibility of collaborating soon - stay tuned for what we’re cooking! πŸ”₯πŸš€
Tweet media one
3
1
21
@ngxson
Xuan-Son Nguyen
13 days
llama.cpp web UI allows processing PDF either as text or image, very useful for learning music theories 🎹🎢
Tweet media one
1
3
21
@ngxson
Xuan-Son Nguyen
7 months
Introducing: the brand new Cortex app from @homebrewltd !!. With Cortex, you can run GGUF models locally, privately!. Powered by llama.cpp under the hood - you have access to more than 45K GGUF models on @huggingface hub. Try it now -->
0
5
20
@ngxson
Xuan-Son Nguyen
15 days
@aniketsauravv @huggingface @ggml_org I didn't test but it should run well on even a raspberry pi.
3
0
20
@ngxson
Xuan-Son Nguyen
2 months
How to try llama.vim with llama.cpp?. Follow this very easy-to-understand guide from Aravind @ Code Rabbit πŸ‘‰
Tweet media one
0
4
20
@ngxson
Xuan-Son Nguyen
14 days
Let LM Studio cook πŸ”₯. Ofc cooking with the main ingredients - llama.cpp from @ggml_org πŸ˜†πŸ˜†.
@mattjcly
matt
14 days
Vision LLM libmtmd-ception: we've adopted llama.cpp's new libmtmd in @lmstudio! . You can now run Pixtral, SmolVLM, InternVL3 and more - 100% locally. Here's Pixtral telling me about @ngxson's viral tweet demoing the new llama.cpp tech πŸ”₯
1
2
19
@ngxson
Xuan-Son Nguyen
2 months
Had a great chat today with @ochafik , maintainer of minja and the tool calling feature in llama.cpp. I was really impressed by his journey, including his time at @Google , which was full of interesting and fun stories. His passion for low-level software engineering and
Tweet media one
1
1
18
@ngxson
Xuan-Son Nguyen
17 days
wow, brew is really the easiest way to install llama.cpp on Mac.
@ggerganov
Georgi Gerganov
17 days
@reach_vb @simonw There have been more than 7k installs of the homebrew package in the past month - not bad!
Tweet media one
1
0
18
@ngxson
Xuan-Son Nguyen
3 months
Flashback to the other day at @huggingface Paris HQ, I'm playing Canon in C++, not Canon in D πŸ˜‚
Tweet media one
1
0
18
@ngxson
Xuan-Son Nguyen
4 months
@llm_fan Yes, actually I can do this, but may take me days to finish. Re. your question about whether the LLM will choose to write SIMD in the first place, it may not! But the key here is that it must reflect about what to do before making the decision.
0
0
17
@ngxson
Xuan-Son Nguyen
2 months
Took a wild ride into the world of vision models! πŸ€—πŸ€—. Wrote about my journey trying to understand how they see and the adventure of getting them into llama.cpp. Grab some coffee and join me:
1
0
17
@ngxson
Xuan-Son Nguyen
12 days
Small reminder: the revenue from this service will be shared with GGML. If you want to support the team, give this a try πŸ™Œ.
@ggml_org
ggml
12 days
Deploy vision models with llama.cpp on Hugging Face
Tweet media one
1
5
17
@ngxson
Xuan-Son Nguyen
4 months
You know you are cooked when DeepSeek-R1 writes low-level SIMD code better than you do πŸ’€ . And yes, the code works, unlike Claude or ChatGPT which hallucinate the answer
Tweet media one
2
0
15
@ngxson
Xuan-Son Nguyen
9 months
Something fun I've done on llama.cpp during the weekend: Improving argument parser system. Not only that it helps maintain project's documentation, but also provides a better user experience. For example, wrong argument value now shows a helpful message:
Tweet media one
3
2
13
@ngxson
Xuan-Son Nguyen
3 months
If you are a ML engineer, please, don't do this. Audio is not vision.
Tweet media one
4
1
14
@ngxson
Xuan-Son Nguyen
1 month
If open-weight models are just bunch of matrices. Then why the f we need to regulate them 🀑.
6
0
14
@ngxson
Xuan-Son Nguyen
2 months
@im_roy_lee @InterviewCoder I imagine a future where the candidate's webcam and voice stream are AI-generated, someone will be able to pass the interview without even participate in the interview πŸ˜†.
0
0
14
@ngxson
Xuan-Son Nguyen
28 days
Which **vision** model you want to be supported by llama.cpp?. We already support llava, minicpm-v, glm-edge, qwen2vl, qwen2.5vl, gemma 3, pixtral, smolVLM.
3
0
14
@ngxson
Xuan-Son Nguyen
18 days
@ClementDelangue @ggerganov Thanks @ClementDelangue for resharing about this! 😻. Here is the link to the documentation if you want to give it a try:
0
3
14
@ngxson
Xuan-Son Nguyen
3 months
Huge thanks for Hugging Face and Google for supporting me with the llama.cpp implementation ❀️. More info:
Tweet media one
1
4
14
@ngxson
Xuan-Son Nguyen
5 months
Can 1B model **think** πŸ€” πŸ€” ?. Check this out --> ollama run hf(.)co/ngxson/MiniThinky-v2-1B-Llama-3.2-Q8_0-GGUF
Tweet media one
1
1
13
@ngxson
Xuan-Son Nguyen
18 days
People: worry about AI breaking the society . Me: casually point out how society is already broken.
2
0
13
@ngxson
Xuan-Son Nguyen
4 months
@ghorbani_asghar Just with some clever prompts πŸ˜‚.
1
0
13
@ngxson
Xuan-Son Nguyen
1 month
llama.cpp vision support just got much better! πŸš€ . Traditionally, models with complicated chat template like MiniCPM-V or Gemma 3 requires a dedicated binary to run. Now, you can use all supported models via a "llama-mtmd-cli" πŸ”₯ . (Only Qwen2VL is not yet supported)
Tweet media one
1
0
13
@ngxson
Xuan-Son Nguyen
11 days
Unmute this video to hear the sound
3
1
13
@ngxson
Xuan-Son Nguyen
5 months
I was thinking about how to try the ChatGPT voice mode without paying monthly subscription. Here we go!.
@_akhaliq
AK
5 months
You can now talk to ChatGPT by calling in the U.S. or by sending a WhatsApp message. You can also talk to chatgpt right now on anychat. here is chatgpt gpt-4o-mini-realtime-preview-2024-12-17 talking to chatgpt advance voice mode
0
1
11
@ngxson
Xuan-Son Nguyen
13 days
Vision models is cool, but it's even cooler for visually impaired people. I totally acknowledge that they relies on modern VLM to see the world, and that's why I always want to deliver the best UX with a11y!
Tweet media one
1
0
12
@ngxson
Xuan-Son Nguyen
17 days
We extend the list to support yet another family of vision model: InternVL 3 and InternVL 2.5 πŸš€. Pinging my friend who work on Vietnamese vision model, @dtkhangbkg
Tweet media one
1
2
12
@ngxson
Xuan-Son Nguyen
29 days
Speed run Qwen3 GGUFπŸƒ.
1
1
12
@ngxson
Xuan-Son Nguyen
1 month
This video is no surprise, Gemini 2.5 Pro has recently been the only one who can write good C++ codes in my recent PRs on llama.cpp. ChatGPT and Claude does the job, but requires more prompting and the coding style doesn't match the rest of the project. Shout out to.
@bycloudai
bycloud
1 month
Gemini 2.5 Pro is just the best choice for AI right now .
Tweet media one
2
1
12
@ngxson
Xuan-Son Nguyen
1 month
Another day at @huggingface office 🎡
Tweet media one
0
0
12
@ngxson
Xuan-Son Nguyen
2 months
Did I mention that we're also working on true multimodal (not just image support)?.
@ggerganov
Georgi Gerganov
2 months
Docker is embracing the ggml/llama.cpp on-device future. Who is going to be next?
Tweet media one
2
1
11
@ngxson
Xuan-Son Nguyen
2 months
On Monday, the 24th, I'm proud to give a talk at @SotaFamily 's webinar. My main talk will last for an hour to deep dive into the current state of on-device LLMs, exploring their advantages, performance trade-offs, and limitations. The session will conclude with an open Q&A,
Tweet media one
1
0
11
@ngxson
Xuan-Son Nguyen
2 months
While writing my new blog post about vision stuff, I was surprise to see Copilot correctly suggested about @ggerganov even though my current document has no mention about him!. Seems like llama.cpp has became a "general" knowledge 😁
Tweet media one
0
0
11
@ngxson
Xuan-Son Nguyen
7 months
End of one-week team building with @huggingface team. So many unforgettable moments. Time & energy well spent πŸ€—
0
0
11
@ngxson
Xuan-Son Nguyen
4 months
@Akumunokokoro Haven't tested, but I think ChatGPT / Claude may be able to produce an acceptable result. The problems are:.- I will need to prompt it more times.- Input length are extremely limited on these platforms.So, without deepseek, I would prefer doing myself.
1
0
11
@ngxson
Xuan-Son Nguyen
18 days
Shout out to @ggerganov and the team for supporting me finalize this feature! Thanks to @huggingface for providing all the necessary hardware during the development process πŸ€—.
0
0
11
@ngxson
Xuan-Son Nguyen
12 days
Hugging Face Inference Endpoints now officially support deploying **vision** models via llama.cpp πŸ‘€ πŸ‘€
Tweet media one
1
2
11
@ngxson
Xuan-Son Nguyen
2 months
(2/3) Collaboration is the key! Special thanks to @art_zucker who wrote the transformers implementation, which allows me to generate random weight, while also serves as a reference point for the cpp code. Also big kudos to @ggerganov who spent time to test and review my PR!
Tweet media one
1
0
11
@ngxson
Xuan-Son Nguyen
24 days
Spent the whole day and still haven't got llama 4 vision to work. At this point, I'm pretty sure that @MistralAI's Mistral Small is the best model (with vision support) that you can find. And btw, llama.cpp has some optimizations that ollama doesn't have πŸ™‚.
2
0
10
@ngxson
Xuan-Son Nguyen
2 months
Everything is open-source if you speak assembly 😜. Tag your friends if this function looks familiar to you hehe 😊
Tweet media one
1
0
10
@ngxson
Xuan-Son Nguyen
1 month
We need more providers under the "Cloud" section πŸ™.
@ggml_org
ggml
1 month
Tweet media one
2
0
10
@ngxson
Xuan-Son Nguyen
28 days
@reach_vb @novita_labs We have inference provider support even before model card is up πŸ˜†πŸ˜†πŸ˜†πŸ˜†πŸ˜†.
0
0
10
@ngxson
Xuan-Son Nguyen
2 months
CC @ggerganov you may want to give this a try! I coded a super small JSON parser to make this work, kinda fun πŸ˜†.
1
0
10