mixedbreadai Profile Banner
Mixedbread Profile
Mixedbread

@mixedbreadai

Followers
831
Following
187
Media
9
Statuses
49

Your fav. AI bakers! We're hiring!

San Francisco, CA
Joined March 2024
Don't wanna be here? Send us removal request.
@mixedbreadai
Mixedbread
2 months
Is OCR quality the silent killer of your RAG performance? 🧵👇. Our deep dive into 8500+ enterprise documents reveals how OCR errors create a "hidden ceiling" for RAG, and how multimodal approaches offer a path forward.
Tweet media one
2
28
125
@mixedbreadai
Mixedbread
2 months
The takeaway: OCR caps your RAG. Multimodal retrieval offers a path to higher accuracy by overcoming text extraction challenges. Combining multimodal retrieval with quality OCR text is the best way forward for now.
0
0
12
@mixedbreadai
Mixedbread
2 months
What it means for real word setting? Find out more here:
1
1
6
@mixedbreadai
Mixedbread
2 months
4/đź§µ Finding 4: Vision-only generation is promising but not ready for multi-doc RAG. Feeding page images directly to the LLM for generation yielded the lowest accuracy (worse than high-quality OCR text). Multimodal excels at finding docs visually today; generating from multiple
Tweet media one
1
1
7
@mixedbreadai
Mixedbread
2 months
4/đź§µ Finding 3: Better retrieval directly means better answers. Using standard OCR text for generation resulted in ~26% fewer correct answers than with perfect text. But using multimodal retrieval (while still feeding OCR text to the LLM) recovered 70% of that lost accuracy.
Tweet media one
1
0
4
@mixedbreadai
Mixedbread
2 months
3/đź§µ Finding 2: Multimodal retrieval breaks the OCR ceiling. By "seeing" page images, multimodal retrieval outperformed all text methods, including perfect ground-truth text. It achieved an average NDCG@5 nearly 12% higher than perfect OCR.
Tweet media one
2
0
5
@mixedbreadai
Mixedbread
2 months
2/đź§µ Finding 1: OCR creates a real performance ceiling for text-based RAG retrieval. Even the best OCR solutions fall ~4.5% short of ground-truth text (NDCG@5) on complex enterprise documents. It's a bottleneck you can't ignore. (Interesting side note: BM25 outperformed
Tweet media one
3
3
15
@mixedbreadai
Mixedbread
2 months
Full blog post:
1
6
17
@mixedbreadai
Mixedbread
3 months
RT @juliuslipp: new bottles out now!!! come get yours today!.
Tweet media one
Tweet media two
0
2
0
@mixedbreadai
Mixedbread
4 months
RT @juliuslipp: together with @aaxsh18, @xmlee97 and @ rui. the only thing missing is you đź«°.
Tweet media one
0
3
0
@mixedbreadai
Mixedbread
4 months
Read more in our blog post: And check models on @huggingface:
1
0
14
@mixedbreadai
Mixedbread
4 months
Great Performance vs. Latency Trade-Off
Tweet media one
1
1
6
@mixedbreadai
Mixedbread
4 months
Multistep Training for Better Understanding
Tweet media one
1
0
5
@mixedbreadai
Mixedbread
4 months
ToolRet Benchmark. mxbai-rerank-v2 also achieves SOTA on the large-scale ToolRet test, showing advanced code & tool usage understanding. Ideal for MCP or any scenario that needs the best-fitting tools.
Tweet media one
1
0
4
@mixedbreadai
Mixedbread
4 months
SOTA on BEIR Benchmark. Despite being multilingual, mxbai-rerank-v2 still outperforms both open & closed-source English models—showcasing strong language understanding and domain generalisability.
1
0
5
@mixedbreadai
Mixedbread
4 months
What’s New?. • 100+ languages, extended context (8–32K).• GRPO: RL for better understanding.• Up to 8× faster.• Use cases: MCP, Code, SQL, JSON, Documents. • Flexible & Open: Self-host or use our API.
1
0
8
@mixedbreadai
Mixedbread
4 months
Baked-in Brilliance: Reranking Meets RL 🍞. Meet mxbai-rerank-v2, our second-gen rerankers built on Qwen2.5 (thanks, @Alibaba_Qwen) & refined with GRPO from @deepseek_ai. They outperform open & closed-source models while staying fully open. 👇
Tweet media one
10
33
216
@mixedbreadai
Mixedbread
7 months
RT @juliuslipp: 5% of all HF downloads go to mxbai-embed :o
Tweet media one
0
9
0
@mixedbreadai
Mixedbread
8 months
RT @juliuslipp: just revamped the hiring page of Mixedbread a bit. we're looking for amazing people to join us doing:.- full stack (next.j….
0
7
0
@mixedbreadai
Mixedbread
10 months
RT @notrab: I posted a new tutorial on how to process embeddings with @mixedbreadai using @redpandadata and @tursodatabase ✨. Link in reply….
0
8
0