JinaAI_ Profile Banner
Jina AI Profile
Jina AI

@JinaAI_

Followers
16K
Following
2K
Media
746
Statuses
2K

Your Search Foundation, Supercharged!

Sunnyvale, CA
Joined March 2020
Don't wanna be here? Send us removal request.
@JinaAI_
Jina AI
10 days
Interested in presenting? Submit your work here: https://t.co/SkC0lnySgb Date: Friday, November 7, 2025 Time: 10:30 AM - 12:00 PM (China Time) Location: A209 The session is open to everyone—just walk in! If you're interested in contributing (like giving a talk), we'd love to
Tweet card summary image
docs.google.com
What: A BoF (Birds of a Feather) in-person session on Embeddings, Reranker and Search LMs for Better Search, co-organized by EMNLP PC and Jina AI. If you never attend BoF before, think it like a...
0
0
0
@JinaAI_
Jina AI
10 days
And time flies—this is already our 3rd EMNLP BoF on retrieval models, following Singapore 2023 and Miami 2024! If you've never attended a BoF before, think of it as a mini-workshop where everyone can jump in and share their work. Our BoF is an in-person session brings together
2
0
1
@JinaAI_
Jina AI
10 days
In 2 weeks, we're presenting at #EMNLP2025 and hosting a BoF on Embeddings, Rerankers, Small LMs for Better Search, again! Come check out our research on training data for multi-hop reasoning, multimodal embeddings, and where retrieval models are headed in 2025/26. Say hi to our
1
4
13
@elastic
Elastic
21 days
We’re excited to announce that we have joined forces with @JinaAI_, a leader in frontier models for multimodal and multilingual search. This acquisition deepens Elastic’s capabilities in retrieval, embeddings, and context engineering to power agentic AI: https://t.co/OAHtJumYuS
14
25
123
@JinaAI_
Jina AI
24 days
Heard you like GGUFs and MLX. Our newly released listwise reranker, jina-reranker-v3, is now available in dynamic quantized GGUFs and MLX. Check out our🤗 collection for the weights and arxiv report: https://t.co/hhz2B9Snu9
1
20
128
@JinaAI_
Jina AI
27 days
In jina-reranker-v3, the query appears twice in the input prompt - once at the beginning for task instructions and once at the end for final attention processing. This dual placement enables the final query position to attend to all preceding documents through causal attention.
1
4
36
@JinaAI_
Jina AI
27 days
Last but not late: jina-reranker-v3 is here! A new 0.6B-parameter listwise reranker that puts query and all candidate documents in one context window and SOTA on BEIR. We call this new query-document interaction "last but not late" - It's "last" because <|doc_emb|> is placed as
2
17
155
@JinaAI_
Jina AI
2 months
Since most of the llama.cpp community focuses on LLMs and text generation, we wanted to share what we've learned about using llama.cpp for multimodal embeddings in practice. Many of the issues we encountered are tricky for end-users to debug on their own.
0
1
15
@JinaAI_
Jina AI
2 months
After these fixes, we tested the llama.cpp model against our PyTorch reference V4 model on ViDoRe tasks using the MTEB benchmark. As you can see in the results table, the GGUF versions and its quantized variants perform nearly identically to the reference model on
1
1
8
@JinaAI_
Jina AI
2 months
V4 is multimodal embeddings, but V4-GGUF wasn't—until now. We've finally cracked how to generate multimodal embeddings using llama.cpp & GGUF. We fixed two main issues. First, in the language model part, we corrected the attention mask in the transformer block so it properly
7
32
172
@JinaAI_
Jina AI
2 months
0
1
19
@JinaAI_
Jina AI
2 months
Traditional code embedding models face a fundamental bottleneck: there simply aren't enough high-quality comment-code pairs for supervised training. By starting with Qwen2.5-Coder pre-trained on 5.5 trillion tokens spanning 92+ programming languages, we inherit deep semantic
4
0
19
@JinaAI_
Jina AI
2 months
Today we're releasing jina-code-embeddings, a new suite of code embedding models in two sizes—0.5B and 1.5B parameters—along with 1~4bit GGUF quantizations for both. Built on latest code generation LLMs, these models achieve SOTA retrieval performance despite their compact size.
9
51
313
@JinaAI_
Jina AI
2 months
Got a Mac with an M-chip? You can now train Gemma3 270m locally as a multilingual embedding or reranker model using our mlx-retrieval project. It lets you train Gemma3 270m locally at 4000 tokens/s on M3 Ultra - that's actually usable speed. We've implemented some standard
7
65
423
@JinaAI_
Jina AI
3 months
So 4000 tokens/sec for a 3B-parameter embedding model on L4 GPU is probably as fast as you'll get with llama.cpp. Or is it? Learn more from https://t.co/RFR8hVBwyF about our findings and fixes.
Tweet card summary image
jina.ai
4000 tokens/sec for a 3B-parameter embedding model on L4 GPU is probably as fast as you'll get with llama.cpp. Or is it?
2
2
13
@JinaAI_
Jina AI
3 months
We also fix the quantization type to IQ3_S and examine how physical batch size (-ub) and context size (-c) affect speed and VRAM. The results on L4 GPU show that -ub=512 with -c=2048 provides the optimal configuration, delivering 4,143 tokens/sec while using 2,025MB VRAM. The
1
0
7
@JinaAI_
Jina AI
3 months
We want to understand the following questions of those GGUFs: - How good our quantization is compared to the original v4 Float16? At what point does it degrade so much that we'd be better off just using v3 embeddings? - How fast can each quantization run on L4, and what's the
1
1
10
@JinaAI_
Jina AI
3 months
This is both good & bad news: on one hand, we can leverage llama.cpp's efficient implementation (e.g. ubatch_size and KV cache) to serve embedding/reranker models; but reality is, llama.cpp's embedding implementations were mostly developed for older encoder-only architectures and
1
1
10