#pymupdf X Hashtag | Muskviewer

Explore tweets tagged as #pymupdf

meng shao

@shao__meng

10 months

这几天在给公司产品的 AI 助手选择知识库的数据处理工具，重新看了一遍 Marker、MinerU、Docling、Markitdown、Llamaparse 这五个工具，结合几个 Deep Search 产品做了一些对比给用户接入做参考，也分享出来，大家有其他更优的工具推荐，欢迎回复给我，先感谢了！ 1. Marker 技术架构 · 基于 PyMuPDF

18

139

603

abhinav

@AbhinavXJ

5 months

Project1: Extracting headings from PDFs I've been trying to build a python script/model, that parses any type of PDF and outputs all the headings including it's h1,h2,h3 tags. I tried multiple parsing libraries. pymupdf pdfplumber pdfminer.six (1/n)

7

0

45

おっさん51

@noa_capm

1 year

おはようございます☁️ PyMuPDFというライブラリーを使えば、PDFファイルから画像だけを抽出できると知って試してみたが、結構、簡単にできた🙌 これ使えば、PDFファイル関連の分析の幅も広がる😆

2

25

280

paneerchilli65

@paneerchilli65

1 year

Day 56 of ML:- •Made my first RAG app "Notes Querier" It is a fully local RAG app Made using -langchain -ollama for embedding model & llm -faiss for vector db -pymupdf for pdf processing -gradio for interface I built it to read theoretical cs subjects.

4

0

75

Yoshi-aki Shimada

@yoshi_and_aki

3 years

暦本先生のsummarize_arxvをローカルで動かしてみた。GPT-3.5へのAPIでの問い合わせで日本語要約の取得とmarpのmd書き出しはうまく行っているぽいのだが、paper.pdfからの画像切り出しに失敗しているっぽい。PyMuPDFパッケージが微妙？

1

3

5

小互

@imxiaohu

1 year

GPTPDF：是一个使用GPT-4o将 PDF 解析为 Markdown 的工具。仅293行代码，它可以几乎完美地解析任何 PDF 文件，包括排版、数学公式、表格、图片和图表等内容，平均每页成本为 $0.013。工作原理：使用 PyMuPDF 库，首先对 PDF 进行解析出所有非文本区域，并做好标记然后使用

6

84

238

Pierre Vannier

@pierre_vannier

1 year

I was loosing my receipts and a lot of time for gathering them and extracting information. So I made this AI based receipts information extraction and saving with tools like @tiangolo Typer, @mirascopeai, @OpenAI, @pydantic, Pymupdf and @textualizeio Rich console. Along with my

1

4

12

Kushagra Sharma

@skushagra9

1 year

Developed an application to streamline knowledge extraction from PDFs: Extracts text using PyMuPDF. Generates embeddings with sentence-transformers. Stores embeddings in a vector database (FAISS) for efficient querying. Retrieves relevant context and provides answers via OpenAI

1

0

4

AIGCLINK

@aigclink

1 year

一款使用AI来解析PDF的开源工具：gptpdf 只用293行代码，几乎完美地解析了排版、数学公式、表格、图片、图表等方法： 1、使用 PyMuPDF 库，对 PDF 进行解析出所有非文本区域，并做好标记 2、使用视觉大模型（如 GPT-4o）进行解析，得到 markdown 文件 github： https://t.co/ib4Z503rdD 效果示例：

20

249

892

Excelおさるくん

@Excel_Osarukun

1 year

今月はPythonの勉強会コミュニティでPDFから表データを取得するコードを書いています☺️ PyMuPDFで取得できなかったデータがTabulaを使ったらきれいに取得できました📸️ 罫線で囲まれていない表データを取得する際はTabulaライブラリを使うのがオススメです💡 #Tabula #PyMuPDF

0

2

26

hanif

@devhanif

9 months

Pdf Compression tool I needed to compress a 900MB document without sacrificing too much on quality and the online tools did not do justice. So I built a custom compression tool, using the pymupdf library in python. Final file output was 65mb with a lot of the quality

7

17

62

Tom Dörr

@tom_doerr

1 year

"PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents."

3

5

44

Ashish Gupta

@AshishGupt22092

2 days

Testing libraries today. PyMuPDF: C++ bindings, no hints. Electron: module resolution chaos. Tried making a quick prototype. Three hours in, still fixing configs. Progress: zero.

0

2

Daft

@daftengine

5 months

Most pipelines force you into rigid OCR-or-nothing workflows. Daft’s UDF architecture + PyMuPDF + Pydantic data structures = document processing that scales from prototype to production without architectural rewrites.

1

0

6

Tatsuya Shirakawa

@s_tat1204

2 years

pdfからのtext/image抽出速度、PyMuPDFが圧倒的に速いんですね。 https://t.co/Yg8nckIWLs

1

0

33

Daisuke Kajiwara

@kajidai

8 hours

GENDA Advent Calendar 2025 #3 Day22の記事です。多様な帳票の表抽出AI。PyMuPDFとGPT-4oの段階的フォールバックで、精度とコストを両立する技術的アプローチを詳解。 GENDA インターン体験記（入江）：表データを抽出するOCRエージェントを作った話 https://t.co/YaxRZkz3vE #Qiitaアドカレ #Qiita

0

GitHubDaily

@GitHub_Daily

7 months

在处理 PDF 文档，想提取里面的文字、表格或图片特别麻烦，不同的库有不同的用法，经常要写一堆重复代码。 ParseStudio 这个 Python 库把各种解析器统一封装起来，只需用几行代码就能搞定 PDF 解析。同时集成了 Docling、PyMuPDF 和 Llama Parse

3

54

170

goldengrape

@goldengrape

3 years

ChatPDF的用法：多数python库会有一个readthedocs的说明文档，在左下角会有一个read the docs的链接，打开以后复制"PDF"的链接例如： https://t.co/621iQpNOi7 然后送进ChatPDF，然后你就可以让它参考这个库来写代码了。这个比让ChatGPT直接写代码还要靠谱一些。 1/n

19

99

357

artifex

@artifex

1 month

New: PyMuPDF-Layout. Our hybrid approach to PDF structure extraction that’s 10× faster than traditional AI parsers and doesn’t need GPUs. Blog:

1

4