
Tony Wu
@tonywu_71
Followers
1K
Following
1K
Media
29
Statuses
304
Multimodal, RAG, Agents | ColPali co-first author | @centralesupelec 🇫🇷 x @Cambridge_Uni 🇬🇧 | Core Researcher at @hcompany_ai 🧑🏻💻
Paris, France
Joined February 2022
🚀 ColQwen2 just dropped in Transformers! 🤗. Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows. Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)
9
94
578
RT @bclavie: single-vector limitations have been known for a while but they're increasingly apparent in the RAG era. @orionweller shows th….
0
13
0
Thank you for the book @huggingface, time for GPUs to go brrrr 😍. (been cooking hard with @hparams at @hcompany_ai, stay tuned!)
0
1
21
Similarity maps also works for the Hf-native ColQwen2 model! 🤗 I have created a cookbook to quickly try this out:.
github.com
Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳 - tonywu71/colpali-cookbooks
I'm finally getting around to playing around more with ColQwen2. Below you can see the similarity map of a PDF page to the token "MA" from the query "How does DeepSeek-V2 compare against the LLaMA family of LLMs?". It highlights almost all occurrences of LLaMA in the chart just
0
5
20
RT @carrigmat: GPT OSS is out. It's OpenAI's first open-weights model release since GPT-2, and some of the technical innovations have huge….
0
37
0
RT @ManuelFaysse: Introducing ColQwen-Omni, a 3B omnimodal retriever that extends the ColPali concept of multimodal retrieval with late int….
0
103
0
RT @bclavie: New blog post & new library are out now!. The BP is about MaxSim, why it's *orders of magnitude* much more demanding than norm….
0
28
0
RT @LoubnaBenAllal1: Introducing SmolLM3: a strong, smol reasoner!. > SoTA 3B model.> dual mode reasoning (think/no_think).> long context,….
0
207
0
RT @laurentsifre: 🚀 Very excited to open-source SurferH-CLI — the first fully local, fully open-source SOTA agent for browser automation.….
0
4
0
RT @Dorialexander: Vision and overall management. There will be no lack of compute grants (people are really over investing in data center….
0
3
0
RT @andimarafioti: Can AI visualize solutions? 🧠👁️.Humans sketch things out in their minds to solve problems. What if Vision-Language Model….
0
41
0
RT @ManuelFaysse: Amazing work on evals! Now, time for big techs to work on improving their VLMs for multi-page document understanding (ins….
0
2
0
RT @AymericRoucher: Today we release ScreenSuite, the most comprehensive evaluation suite for GUI agents (aka Computer Use agents). We pack….
0
13
0
RT @ManuelFaysse: Reducing ColPali / ColQwen index size is super valuable in many use case, and I know many people who tried and couldn't b….
0
17
0
RT @Dorialexander: Announcing the release of the official Common Corpus paper: a 20 page report detailing how we collected, processed and p….
0
96
0
RT @Dorialexander: Been calling it for some time, though I'm very skeptical about the super long context part: small models have constraine….
0
7
0
Nice read from my friend Max about an elegant way to solve the lack of context in textual information retrieval!.
🕺Super happy to release our latest work with @ManuelFaysse: in our paper "Context Is Gold to Find the Gold Passage", we share all our findings on how to train embedding models to meaningfully include doc-wide context into chunks - leading to convincing results! 🧑🍳 🧵1/N
0
0
2
RT @Dorialexander: Mid-training coming to web visual parsing: mix of crawl, synthetic web data and simulated web traces at scale. https://t….
0
1
0
🚀 Another open-source drop! Our team at @hcompany_ai is open-sourcing Holo-1 👀, our action-oriented VLM for web navigation — and dropping a new benchmark, WebClick 🌐, to push the field forward!. More details in our technical report: 📄
2️⃣ Holo-1: We are Open-Sourcing our Visual-Language Model powering Surfer H . We’ve beefed up Surfer H’s web automation with Holo-1, our 3B & 7B-parameters Action Models. It now achieves industry-leading UI localization and navigation accuracy while staying compact and
1
8
55