Explore tweets tagged as #M3DocRAG
@jmin__cho
Jaemin Cho
11 months
Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal & Multi-Page & Multi-Documents (+ a new open-domain benchmark + strong results on 3 benchmarks)! ⚡️Key Highlights: ➡️ M3DocRAG flexibly accommodates various settings: - closed & open-domain document
5
89
318
@jmin__cho
Jaemin Cho
11 months
M3DocRAG consists of 3 stages: 1) Extract visual embedding (e.g., w/ ColPali) from each page image. 2) Retrieve top-K pages (+ approximate indexing for faster search in open-domain setting) 3) Generate an answer with a multimodal LM (e.g., Qwen2-VL) given the retrieved K
1
3
25
@jmin__cho
Jaemin Cho
11 months
In MP-DocVQA (closed-domain DocVQA with up to 20 pages), while the text RAG pipeline falls short compared to existing approaches, all multimodal pipelines outperform their text-based counterpart. Notably, our M3DocRAG delivers the state-of-the-art results.
1
1
7
@jmin__cho
Jaemin Cho
11 months
In M3DocVQA (open-domain DocVQA on 3K docs), our M3DocRAG (ColPali + Qwen2-VL 7B) significantly outperforms text RAG (ColBERT v2 + Llama 3.1 8B), across all different evidence modalities / question hops / # pages. The gap is bigger when the evidence involves images.
1
1
6
@jmin__cho
Jaemin Cho
11 months
In MMLongBench-Doc (closed-domain DocVQA with up to 120 pages), our M3DocRAG (ColPali + Qwen2-VL 7B) achieves the best scores in most settings. This demonstrates the effectiveness of multimodal retrieval over handling many pages by concatenating low-resolution images.
1
1
8
@Marktechpost
Marktechpost AI Dev News ⚡
11 months
Researchers from Bloomberg and UNC Chapel Hill Introduce M3DocRAG: A Novel Multi-Modal RAG Framework that Flexibly Accommodates Various Document Context Researchers from UNC Chapel Hill and Bloomberg have introduced M3DocRAG, a groundbreaking framework designed to enhance AI’s
0
7
31
@kalyan_kpl
Kalyan KS
11 months
M3DocRAG : Multi-modal RAG System This paper presents M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts (closed-domain and open-domain), question hops (single-hop and multi-hop), and evidence modalities (text, chart, figure, etc.).
1
10
48
@jmin__cho
Jaemin Cho
11 months
Lastly, we qualitatively show that M3DocRAG can successfully handle various scenarios, such as when relevant information exists across multiple pages and when answer evidence only exists in images.
1
0
8
@ionmosnoi
Moșnoi Ion
11 months
M3DocRAG, a novel framework that enhances retrieval-augmented generation (RAG) by integrating multi-modal elements, including text, charts, and figures. This approach not only accommodates various document contexts but also improves performance in answering complex questions
1
0
2
@ai_bites
AI Bites | YouTube Channel
11 months
M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts (closed-domain and open-domain), question hops (single-hop and multi-hop), and evidence modalities (text, chart, figure, etc.). M3DocRAG finds relevant documents and answers questions
0
0
1
@MichelIvan92347
Agent B
11 months
A very interesting paper : M3DocRAG = ColPali on multi-pages & documents + Pages retrieval Bravo to the team for the hard work 👏 From an IR perspective, these scenarios are really interesting imo =
@jmin__cho
Jaemin Cho
11 months
Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal & Multi-Page & Multi-Documents (+ a new open-domain benchmark + strong results on 3 benchmarks)! ⚡️Key Highlights: ➡️ M3DocRAG flexibly accommodates various settings: - closed & open-domain document
0
2
9
@ledgeai
Ledge.ai | AIトレンドの鉱脈
11 months
複数文書とページを横断的に理解するマルチモーダルAI対応のRAGフレームワーク「M3DOCRAG」:Bloombergとノースカロライナ大学の研究チーム
0
1
5
@ai_hakase_
ハカセ アイ(Ai-Hakase)🐾最新トレンドAIのためのX 🐾
11 months
🎈 M3DocRAG:マルチモーダルRAGシステム! ✎. https://t.co/HOJIY1dkZi M3DocRAGは、単一ページから複数ページのドキュメントまで、さまざまな形式のドキュメントに対応できる質問応答システムだそうです! テキスト、テーブル、画像など、さまざまな要素を扱えるそうです!
0
0
1
@arxivsanitybot
ml-sanity bot
11 months
https://t.co/7gnmTdsBcF Authors unveil M3DocRAG, an innovative multi-modal framework for answering questions across diverse documents, handling multiple pages and visual data. It outperforms baselines on benchmarks like M3DocVQA, showcasing its impressive capabilities.
0
0
1
@TheTuringPost
TuringPost
11 months
The freshest AI/ML researches of the week, part 2 ▪️ WEBRL: Training LLM Web Agents ▪️ DynaSaur: Large Language Agents ▪️ THANOS: Skill-Of-Mind-Infused Agents ▪️ DELIFT ▪️ HtmlRAG ▪️ M3DOCRAG ▪️ Needle Threading ▪️ Survey Of Cultural Awareness In LMs ▪️ OPENCODER ▪️ Polynomial
1
9
28
@kuldeep_s_s
Kuldeep Singh Sidhu
11 months
Exciting breakthrough in Document AI! Researchers from UNC Chapel Hill and @business have developed M3DocRAG, a revolutionary framework for multi-modal document understanding. The innovation lies in its ability to handle complex document scenarios that traditional systems
0
0
0
@TheTuringPost
TuringPost
11 months
6. M3DOCRAG: Multi-Modal Retrieval For Document Understanding Introduces a multimodal RAG framework to handle multi-page and document QA tasks with visual data. https://t.co/Fuf2Xu4sRN
1
1
1
@mbuffa02
The_Science_G
11 months
Researchers from Bloomberg and UNC Chapel Hill Introduce M3DocRAG: A Novel Multi-Modal RAG Framework that Flexibly Accommodates Various Document Context
0
0
0
@vmnarendran
Mohan Narendran
11 months
Great that Bloomberg is involved in this https://t.co/qwZLHnUoyO
0
0
0
@Linus_MK
ライナス
11 months
はてなブログに投稿しました [論文メモ]M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding - 子供の落書き帳 Renaissance https://t.co/RgkVedei3M #はてなブログ
0
0
0