Explore tweets tagged as #SmolDocling
@Marktechpost
Marktechpost AI Dev News ⚡
1 month
IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model IBM’s Granite-Docling-258M is an open-source (Apache-2.0) compact vision-language model for document conversion, succeeding SmolDocling with a Granite 165M backbone and SigLIP2 vision
4
13
33
@reach_vb
Vaibhav (VB) Srivastav
1 month
BOOM! IBM just released an updated SmolDocling - tiny 258M param SoTA VLM - Apache 2.0 licensed! 🔥 Capable of doing OCR, Visual QA, Translation and much more - try it out now!
14
122
690
@mervenoyann
merve
7 months
Fresh: SmolDocling 🔥 state-of-the-art open-source lightning-fast OCR 📑 read single document in 0.35 seconds (on A100) using 0.5GB VRAM it's a 256M model that beats every model (even ones 27x larger, including Qwen2.5VL 🤯)
21
149
1K
@andimarafioti
Andi Marafioti
3 months
🚀 We're thrilled to launch four new OCR datasets with 20M images: DoclingMatix, SynthFormulaNet, SynthCodeNet, and SynthChartNet. We used them train SmolDocling, our ultra‑compact (256M) full-page document conversion VLM with performance rivaling models up to 27× larger.
5
78
555
@LiorOnAI
Lior Alexander
7 months
A 256M open-source vision LM for complete document OCR just beat models 27× bigger. SmolDocling converts full documents into structured metadata using <500MB VRAM on consumer GPUs.
9
136
828
@Sumanth_077
Sumanth
7 months
Transform any document into LLM-ready data! Introducing SmolDocling, a light weight, state-of-the-art, lightning-fast OCR model. SmolDocling is ultra-compact size (256M parameters) with performance matching much larger models 100% Open Source
6
104
501
@tuturetom
Tom Huang
7 months
消费级显卡可运行,RAG 福音!最小最强文档 OCR 模型开源!⚡️ SmolDocling 仅 256M 🧙‍♀️⚡️ 可在 < 500MB VRAM 的消费级 GPU 可在 0.35 秒内处理一个页面 达到文档转换 SOTA,超出其他模型 27x 🔥(有点离谱 ... 开源地址👉 https://t.co/92DiBuTfD4
6
91
365
@Marktechpost
Marktechpost AI Dev News ⚡
7 months
IBM and Hugging Face Researchers Release SmolDocling: A 256M Open-Source Vision Language Model for Complete Document OCR Researchers from IBM and Hugging Face have recently addressed these challenges by releasing SmolDocling, a 256M open-source vision-language model (VLM)
1
12
28
@jeffboudier
Jeff Boudier 🤗
7 months
Document AI use cases: checkmate! ♟️👑 Introducing SmolDocling SmolDocling is ideal for enterprise use cases: 🤏 256M parameters - cheap and easy to run locally 🏆 Performs better than 20x larger models! 💨 Fast inference using VLLM – Avg of 0.35 secs per page on A100 GPU. ⚖️
7
43
347
@andimarafioti
Andi Marafioti
7 months
VLMs connect a vision encoder to a language model using a linear layer—meaning each visual feature becomes one token. To reduce the number of tokens, SmolVLM and SmolDocling use pixel shuffle—but how does it work? 🔄 Pixel shuffle rearranges the encoded image, trading spatial
6
52
448
@YoussefHosni95
يوسف حسني
7 months
لو عايز تبني تطبيق Local OCR بحيث متضطرش تستخدم اي API، تقدر تستخدم SmolDocling، وده نموذج Vision-Language صغير جدًا بيعمل OCR من البداية للنهاية وبيشتغل localy من غير إنترنت. ال SmolDocling هو نموذج مفتوح المصدر بحجم 256M ومتخصص و تقدر تجربه من هنا: https://t.co/PnHYDckh6l
0
1
27
@instdin
Institutional Data Initiative
2 months
Can a small visual language model read documents as effectively as models 27 times its size? Next Friday, IDI will host Michele Dolfi and Peter Staar from @IBMResearch Zurich to discuss their work on SmolDocling, an “ultra-compact” model for diverse OCR tasks.
1
0
1
@gryhkn
giray
7 months
hf ekibi, 256m parametre boyutundaki smoldocling adındaki vlm'i duyuruldu. sadece a100 gpu'da sayfa başına 0.35 saniye hızla çalışabilen model, kendisinden 20 kat büyük rakiplerinden daha iyi performans gösteriyormuş. smoldocling'in yapabildiği şeyler: - ocr ile görüntülerden
0
0
46
@andimarafioti
Andi Marafioti
7 months
🚀We just dropped SmolDocling: a 256M open-source vision LM for complete document OCR!📄✨ It's lightning fast, process a page in 0.35 sec on consumer GPU using < 500MB VRAM⚡ SOTA in document conversion, beating every competing model we tested up to 27x larger🤯 But how? 🧶⬇️
56
373
3K
@Arindam_1729
Arindam Majumder 𝕏
2 months
Ultra-compact doc conversion model! SmolDocling is a tiny but mighty multimodal model for end-to-end document conversion: OCR, layout, code, formulas, charts, tables, and more... Fully compatible with Docling, but way more compact. 100% open-source.
3
1
20
@LoubnaBenAllal1
Loubna Ben Allal
7 months
Another day where smol models push beyond their weight 🚀 SmolDocling: a 256M vision LM for document OCR that processes a page in 0.35 sec on consumer GPU, while beating models 27× larger.
4
8
92
@mervenoyann
merve
7 months
So many open releases at @huggingface past week 🤯 recapping all here ⤵️ 👀 Multimodal > Mistral released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS) > with @IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS) >
11
41
268
@trojrobert
Robert John | Don’t Fear AI
7 months
Local OCR is here—no more sending docs to APIs! SmolDocling (256M) is a tiny but powerful vision LM for OCR: ✅ End-to-end doc conversion ✅ Fast (0.35s/page, <500MB VRAM) ✅ Accurate (beats models 27× larger) Small, optimized AI > bloated models. Love this! 🔥 #AI #OCR #ML
0
0
4
@andimarafioti
Andi Marafioti
1 month
SmolDocling just got a HUGE improvement, meet GraniteDocling!🚀 Improved performance in all the ways that matter: multilingual, more reliable, but still tiny at 258M params!🤏 It's lightning fast, process a page in 0.35 sec on a consumer GPU using < 500MB VRAM⚡
1
2
11
@Arindam_1729
Arindam Majumder 𝕏
2 months
SmolDocling uses DocTags to keep text and structure separate! It makes it way easier for image-to-sequence models to parse docs without getting confused. Unlike direct HTML/Markdown, it keeps layout details intact, uses fewer tokens, and exports cleanly to HTML, Markdown, or
1
0
8