Explore tweets tagged as #SmolDocling
IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model IBM’s Granite-Docling-258M is an open-source (Apache-2.0) compact vision-language model for document conversion, succeeding SmolDocling with a Granite 165M backbone and SigLIP2 vision
4
13
33
BOOM! IBM just released an updated SmolDocling - tiny 258M param SoTA VLM - Apache 2.0 licensed! 🔥 Capable of doing OCR, Visual QA, Translation and much more - try it out now!
14
122
690
Fresh: SmolDocling 🔥 state-of-the-art open-source lightning-fast OCR 📑 read single document in 0.35 seconds (on A100) using 0.5GB VRAM it's a 256M model that beats every model (even ones 27x larger, including Qwen2.5VL 🤯)
21
149
1K
🚀 We're thrilled to launch four new OCR datasets with 20M images: DoclingMatix, SynthFormulaNet, SynthCodeNet, and SynthChartNet. We used them train SmolDocling, our ultra‑compact (256M) full-page document conversion VLM with performance rivaling models up to 27× larger.
5
78
555
A 256M open-source vision LM for complete document OCR just beat models 27× bigger. SmolDocling converts full documents into structured metadata using <500MB VRAM on consumer GPUs.
9
136
828
Transform any document into LLM-ready data! Introducing SmolDocling, a light weight, state-of-the-art, lightning-fast OCR model. SmolDocling is ultra-compact size (256M parameters) with performance matching much larger models 100% Open Source
6
104
501
消费级显卡可运行,RAG 福音!最小最强文档 OCR 模型开源!⚡️ SmolDocling 仅 256M 🧙♀️⚡️ 可在 < 500MB VRAM 的消费级 GPU 可在 0.35 秒内处理一个页面 达到文档转换 SOTA,超出其他模型 27x 🔥(有点离谱 ... 开源地址👉 https://t.co/92DiBuTfD4
6
91
365
IBM and Hugging Face Researchers Release SmolDocling: A 256M Open-Source Vision Language Model for Complete Document OCR Researchers from IBM and Hugging Face have recently addressed these challenges by releasing SmolDocling, a 256M open-source vision-language model (VLM)
1
12
28
Document AI use cases: checkmate! ♟️👑 Introducing SmolDocling SmolDocling is ideal for enterprise use cases: 🤏 256M parameters - cheap and easy to run locally 🏆 Performs better than 20x larger models! 💨 Fast inference using VLLM – Avg of 0.35 secs per page on A100 GPU. ⚖️
7
43
347
VLMs connect a vision encoder to a language model using a linear layer—meaning each visual feature becomes one token. To reduce the number of tokens, SmolVLM and SmolDocling use pixel shuffle—but how does it work? 🔄 Pixel shuffle rearranges the encoded image, trading spatial
6
52
448
لو عايز تبني تطبيق Local OCR بحيث متضطرش تستخدم اي API، تقدر تستخدم SmolDocling، وده نموذج Vision-Language صغير جدًا بيعمل OCR من البداية للنهاية وبيشتغل localy من غير إنترنت. ال SmolDocling هو نموذج مفتوح المصدر بحجم 256M ومتخصص و تقدر تجربه من هنا: https://t.co/PnHYDckh6l
0
1
27
Can a small visual language model read documents as effectively as models 27 times its size? Next Friday, IDI will host Michele Dolfi and Peter Staar from @IBMResearch Zurich to discuss their work on SmolDocling, an “ultra-compact” model for diverse OCR tasks.
1
0
1
hf ekibi, 256m parametre boyutundaki smoldocling adındaki vlm'i duyuruldu. sadece a100 gpu'da sayfa başına 0.35 saniye hızla çalışabilen model, kendisinden 20 kat büyük rakiplerinden daha iyi performans gösteriyormuş. smoldocling'in yapabildiği şeyler: - ocr ile görüntülerden
0
0
46
🚀We just dropped SmolDocling: a 256M open-source vision LM for complete document OCR!📄✨ It's lightning fast, process a page in 0.35 sec on consumer GPU using < 500MB VRAM⚡ SOTA in document conversion, beating every competing model we tested up to 27x larger🤯 But how? 🧶⬇️
56
373
3K
Ultra-compact doc conversion model! SmolDocling is a tiny but mighty multimodal model for end-to-end document conversion: OCR, layout, code, formulas, charts, tables, and more... Fully compatible with Docling, but way more compact. 100% open-source.
3
1
20
Another day where smol models push beyond their weight 🚀 SmolDocling: a 256M vision LM for document OCR that processes a page in 0.35 sec on consumer GPU, while beating models 27× larger.
4
8
92
So many open releases at @huggingface past week 🤯 recapping all here ⤵️ 👀 Multimodal > Mistral released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS) > with @IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS) >
11
41
268
SmolDocling just got a HUGE improvement, meet GraniteDocling!🚀 Improved performance in all the ways that matter: multilingual, more reliable, but still tiny at 258M params!🤏 It's lightning fast, process a page in 0.35 sec on a consumer GPU using < 500MB VRAM⚡
1
2
11
SmolDocling uses DocTags to keep text and structure separate! It makes it way easier for image-to-sequence models to parse docs without getting confused. Unlike direct HTML/Markdown, it keeps layout details intact, uses fewer tokens, and exports cleanly to HTML, Markdown, or
1
0
8