#M3DocRAG X Hashtag | Muskviewer

Explore tweets tagged as #M3DocRAG

Jaemin Cho

@jmin__cho

11 months

Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal & Multi-Page & Multi-Documents (+ a new open-domain benchmark + strong results on 3 benchmarks)! ⚡️Key Highlights: ➡️ M3DocRAG flexibly accommodates various settings: - closed & open-domain document

5

89

318

Jaemin Cho

@jmin__cho

11 months

M3DocRAG consists of 3 stages: 1) Extract visual embedding (e.g., w/ ColPali) from each page image. 2) Retrieve top-K pages (+ approximate indexing for faster search in open-domain setting) 3) Generate an answer with a multimodal LM (e.g., Qwen2-VL) given the retrieved K

1

3

25

Jaemin Cho

@jmin__cho

11 months

In MP-DocVQA (closed-domain DocVQA with up to 20 pages), while the text RAG pipeline falls short compared to existing approaches, all multimodal pipelines outperform their text-based counterpart. Notably, our M3DocRAG delivers the state-of-the-art results.

1

7

Jaemin Cho

@jmin__cho

11 months

In M3DocVQA (open-domain DocVQA on 3K docs), our M3DocRAG (ColPali + Qwen2-VL 7B) significantly outperforms text RAG (ColBERT v2 + Llama 3.1 8B), across all different evidence modalities / question hops / # pages. The gap is bigger when the evidence involves images.

1

6

Jaemin Cho

@jmin__cho

11 months

In MMLongBench-Doc (closed-domain DocVQA with up to 120 pages), our M3DocRAG (ColPali + Qwen2-VL 7B) achieves the best scores in most settings. This demonstrates the effectiveness of multimodal retrieval over handling many pages by concatenating low-resolution images.

1

8

Marktechpost AI Dev News ⚡

@Marktechpost

11 months

Researchers from Bloomberg and UNC Chapel Hill Introduce M3DocRAG: A Novel Multi-Modal RAG Framework that Flexibly Accommodates Various Document Context Researchers from UNC Chapel Hill and Bloomberg have introduced M3DocRAG, a groundbreaking framework designed to enhance AI’s

0

7

31

Kalyan KS

@kalyan_kpl

11 months

M3DocRAG : Multi-modal RAG System This paper presents M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts (closed-domain and open-domain), question hops (single-hop and multi-hop), and evidence modalities (text, chart, figure, etc.).

1

10

48

Jaemin Cho

@jmin__cho

11 months

Lastly, we qualitatively show that M3DocRAG can successfully handle various scenarios, such as when relevant information exists across multiple pages and when answer evidence only exists in images.

1

0

8

Moșnoi Ion

@ionmosnoi

11 months

M3DocRAG, a novel framework that enhances retrieval-augmented generation (RAG) by integrating multi-modal elements, including text, charts, and figures. This approach not only accommodates various document contexts but also improves performance in answering complex questions

1

0

2

AI Bites | YouTube Channel

@ai_bites

11 months

M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts (closed-domain and open-domain), question hops (single-hop and multi-hop), and evidence modalities (text, chart, figure, etc.). M3DocRAG finds relevant documents and answers questions

0

1

Agent B

@MichelIvan92347

11 months

A very interesting paper : M3DocRAG = ColPali on multi-pages & documents + Pages retrieval Bravo to the team for the hard work 👏 From an IR perspective, these scenarios are really interesting imo =

Jaemin Cho

@jmin__cho

11 months

Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal & Multi-Page & Multi-Documents (+ a new open-domain benchmark + strong results on 3 benchmarks)! ⚡️Key Highlights: ➡️ M3DocRAG flexibly accommodates various settings: - closed & open-domain document

0

2

9

Ledge.ai | AIトレンドの鉱脈

@ledgeai

11 months

複数文書とページを横断的に理解するマルチモーダルAI対応のRAGフレームワーク「M3DOCRAG」：Bloombergとノースカロライナ大学の研究チーム

0

1

5

ハカセアイ(Ai-Hakase)🐾最新トレンドＡＩのためのＸ 🐾

@ai_hakase_

11 months

🎈 M3DocRAG：マルチモーダルRAGシステム！ ✎. https://t.co/HOJIY1dkZi M3DocRAGは、単一ページから複数ページのドキュメントまで、さまざまな形式のドキュメントに対応できる質問応答システムだそうです！テキスト、テーブル、画像など、さまざまな要素を扱えるそうです！

0

1

ml-sanity bot

@arxivsanitybot

11 months

https://t.co/7gnmTdsBcF Authors unveil M3DocRAG, an innovative multi-modal framework for answering questions across diverse documents, handling multiple pages and visual data. It outperforms baselines on benchmarks like M3DocVQA, showcasing its impressive capabilities.

0

1

TuringPost

@TheTuringPost

11 months

The freshest AI/ML researches of the week, part 2 ▪️ WEBRL: Training LLM Web Agents ▪️ DynaSaur: Large Language Agents ▪️ THANOS: Skill-Of-Mind-Infused Agents ▪️ DELIFT ▪️ HtmlRAG ▪️ M3DOCRAG ▪️ Needle Threading ▪️ Survey Of Cultural Awareness In LMs ▪️ OPENCODER ▪️ Polynomial

1

9

28

Kuldeep Singh Sidhu

@kuldeep_s_s

11 months

Exciting breakthrough in Document AI! Researchers from UNC Chapel Hill and @business have developed M3DocRAG, a revolutionary framework for multi-modal document understanding. The innovation lies in its ability to handle complex document scenarios that traditional systems

0

TuringPost

@TheTuringPost

11 months

6. M3DOCRAG: Multi-Modal Retrieval For Document Understanding Introduces a multimodal RAG framework to handle multi-page and document QA tasks with visual data. https://t.co/Fuf2Xu4sRN

1

The_Science_G

@mbuffa02

11 months

Researchers from Bloomberg and UNC Chapel Hill Introduce M3DocRAG: A Novel Multi-Modal RAG Framework that Flexibly Accommodates Various Document Context

0

Mohan Narendran

@vmnarendran

11 months

Great that Bloomberg is involved in this https://t.co/qwZLHnUoyO

0

ライナス

@Linus_MK

11 months

はてなブログに投稿しました［論文メモ］M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding - 子供の落書き帳 Renaissance https://t.co/RgkVedei3M #はてなブログ

0