tonywu_71 Profile Banner
Tony Wu Profile
Tony Wu

@tonywu_71

Followers
1K
Following
1K
Media
29
Statuses
304

Multimodal, RAG, Agents | ColPali co-first author | @centralesupelec 🇫🇷 x @Cambridge_Uni 🇬🇧 | Core Researcher at @hcompany_ai 🧑🏻‍💻

Paris, France
Joined February 2022
Don't wanna be here? Send us removal request.
@tonywu_71
Tony Wu
3 months
🚀 ColQwen2 just dropped in Transformers! 🤗. Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows. Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)
Tweet media one
9
94
578
@tonywu_71
Tony Wu
42 minutes
RT @bclavie: single-vector limitations have been known for a while but they're increasingly apparent in the RAG era. @orionweller shows th….
0
13
0
@grok
Grok
4 days
Join millions who have switched to Grok.
174
333
3K
@tonywu_71
Tony Wu
2 days
Thank you for the book @huggingface, time for GPUs to go brrrr 😍. (been cooking hard with @hparams at @hcompany_ai, stay tuned!)
Tweet media one
0
1
21
@tonywu_71
Tony Wu
13 days
Similarity maps also works for the Hf-native ColQwen2 model! 🤗 I have created a cookbook to quickly try this out:.
github.com
Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻‍🍳 - tonywu71/colpali-cookbooks
@helloiamleonie
Leonie
13 days
I'm finally getting around to playing around more with ColQwen2. Below you can see the similarity map of a PDF page to the token "MA" from the query "How does DeepSeek-V2 compare against the LLaMA family of LLMs?". It highlights almost all occurrences of LLaMA in the chart just
Tweet media one
0
5
20
@tonywu_71
Tony Wu
26 days
RT @carrigmat: GPT OSS is out. It's OpenAI's first open-weights model release since GPT-2, and some of the technical innovations have huge….
0
37
0
@tonywu_71
Tony Wu
1 month
RT @ManuelFaysse: Introducing ColQwen-Omni, a 3B omnimodal retriever that extends the ColPali concept of multimodal retrieval with late int….
0
103
0
@tonywu_71
Tony Wu
2 months
RT @bclavie: New blog post & new library are out now!. The BP is about MaxSim, why it's *orders of magnitude* much more demanding than norm….
0
28
0
@tonywu_71
Tony Wu
2 months
Sick trick, I didn't know hackers could leverage the tokenization of unicode surrogate pairs to hide content within copy-pastable strings!.
@simonw
Simon Willison
2 months
I got Claude to build me an artifact to help decode this sneaky prompt attack
Tweet media one
0
0
3
@tonywu_71
Tony Wu
2 months
So aesthetic I'm jealous.
@LoubnaBenAllal1
Loubna Ben Allal
2 months
Everything you need to know is in our engineering blueprint
Tweet media one
0
0
6
@tonywu_71
Tony Wu
2 months
RT @LoubnaBenAllal1: Introducing SmolLM3: a strong, smol reasoner!. > SoTA 3B model.> dual mode reasoning (think/no_think).> long context,….
0
207
0
@tonywu_71
Tony Wu
2 months
RT @laurentsifre: 🚀 Very excited to open-source SurferH-CLI — the first fully local, fully open-source SOTA agent for browser automation.….
0
4
0
@tonywu_71
Tony Wu
2 months
RT @Dorialexander: Vision and overall management. There will be no lack of compute grants (people are really over investing in data center….
0
3
0
@tonywu_71
Tony Wu
2 months
RT @andimarafioti: Can AI visualize solutions? 🧠👁️.Humans sketch things out in their minds to solve problems. What if Vision-Language Model….
0
41
0
@tonywu_71
Tony Wu
3 months
RT @ManuelFaysse: Amazing work on evals! Now, time for big techs to work on improving their VLMs for multi-page document understanding (ins….
0
2
0
@tonywu_71
Tony Wu
3 months
RT @AymericRoucher: Today we release ScreenSuite, the most comprehensive evaluation suite for GUI agents (aka Computer Use agents). We pack….
0
13
0
@tonywu_71
Tony Wu
3 months
RT @ManuelFaysse: Reducing ColPali / ColQwen index size is super valuable in many use case, and I know many people who tried and couldn't b….
0
17
0
@tonywu_71
Tony Wu
3 months
RT @Dorialexander: Announcing the release of the official Common Corpus paper: a 20 page report detailing how we collected, processed and p….
0
96
0
@tonywu_71
Tony Wu
3 months
RT @Dorialexander: Been calling it for some time, though I'm very skeptical about the super long context part: small models have constraine….
0
7
0
@tonywu_71
Tony Wu
3 months
Nice read from my friend Max about an elegant way to solve the lack of context in textual information retrieval!.
@mlpc123
Max Conti
3 months
🕺Super happy to release our latest work with @ManuelFaysse: in our paper "Context Is Gold to Find the Gold Passage", we share all our findings on how to train embedding models to meaningfully include doc-wide context into chunks - leading to convincing results! 🧑‍🍳 🧵1/N
Tweet media one
0
0
2
@tonywu_71
Tony Wu
3 months
RT @Dorialexander: Mid-training coming to web visual parsing: mix of crawl, synthetic web data and simulated web traces at scale. https://t….
0
1
0
@tonywu_71
Tony Wu
3 months
🚀 Another open-source drop! Our team at @hcompany_ai is open-sourcing Holo-1 👀, our action-oriented VLM for web navigation — and dropping a new benchmark, WebClick 🌐, to push the field forward!. More details in our technical report: 📄
@hcompany_ai
H
3 months
2️⃣ Holo-1: We are Open-Sourcing our Visual-Language Model powering Surfer H . We’ve beefed up Surfer H’s web automation with Holo-1, our 3B & 7B-parameters Action Models. It now achieves industry-leading UI localization and navigation accuracy while staying compact and
Tweet media one
1
8
55