Tony Wu @tonywu_71 X Profile

Tony Wu

@tonywu_71

Followers

1K

Following

1K

Media

29

Statuses

304

Multimodal, RAG, Agents | ColPali co-first author | @centralesupelec 🇫🇷 x @Cambridge_Uni 🇬🇧 | Core Researcher at @hcompany_ai 🧑🏻‍💻

Paris, France

Joined February 2022

Don't wanna be here? Send us removal request.

Tony Wu

@tonywu_71

3 months

🚀 ColQwen2 just dropped in Transformers! 🤗. Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows. Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)

9

94

578

Tony Wu

@tonywu_71

42 minutes

RT @bclavie: single-vector limitations have been known for a while but they're increasingly apparent in the RAG era. @orionweller shows th….

0

13

0

Grok

@grok

4 days

Join millions who have switched to Grok.

174

333

3K

Tony Wu

@tonywu_71

2 days

Thank you for the book @huggingface, time for GPUs to go brrrr 😍. (been cooking hard with @hparams at @hcompany_ai, stay tuned!)

0

1

21

Tony Wu

@tonywu_71

13 days

Similarity maps also works for the Hf-native ColQwen2 model! 🤗 I have created a cookbook to quickly try this out:.

github.com

Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻‍🍳 - tonywu71/colpali-cookbooks

Leonie

@helloiamleonie

13 days

I'm finally getting around to playing around more with ColQwen2. Below you can see the similarity map of a PDF page to the token "MA" from the query "How does DeepSeek-V2 compare against the LLaMA family of LLMs?". It highlights almost all occurrences of LLaMA in the chart just

0

5

20

Tony Wu

@tonywu_71

26 days

RT @carrigmat: GPT OSS is out. It's OpenAI's first open-weights model release since GPT-2, and some of the technical innovations have huge….

0

37

0

Tony Wu

@tonywu_71

1 month

RT @ManuelFaysse: Introducing ColQwen-Omni, a 3B omnimodal retriever that extends the ColPali concept of multimodal retrieval with late int….

0

103

0

Tony Wu

@tonywu_71

2 months

RT @bclavie: New blog post & new library are out now!. The BP is about MaxSim, why it's *orders of magnitude* much more demanding than norm….

0

28

0

Tony Wu

@tonywu_71

2 months

Sick trick, I didn't know hackers could leverage the tokenization of unicode surrogate pairs to hide content within copy-pastable strings!.

Simon Willison

@simonw

2 months

I got Claude to build me an artifact to help decode this sneaky prompt attack

0

3

Tony Wu

@tonywu_71

2 months

So aesthetic I'm jealous.

Loubna Ben Allal

@LoubnaBenAllal1

2 months

Everything you need to know is in our engineering blueprint

0

6

Tony Wu

@tonywu_71

2 months

RT @LoubnaBenAllal1: Introducing SmolLM3: a strong, smol reasoner!. > SoTA 3B model.> dual mode reasoning (think/no_think).> long context,….

0

207

0

Tony Wu

@tonywu_71

2 months

RT @laurentsifre: 🚀 Very excited to open-source SurferH-CLI — the first fully local, fully open-source SOTA agent for browser automation.….

0

4

0

Tony Wu

@tonywu_71

2 months

RT @Dorialexander: Vision and overall management. There will be no lack of compute grants (people are really over investing in data center….

0

3

0

Tony Wu

@tonywu_71

2 months

RT @andimarafioti: Can AI visualize solutions? 🧠👁️.Humans sketch things out in their minds to solve problems. What if Vision-Language Model….

0

41

0

Tony Wu

@tonywu_71

3 months

RT @ManuelFaysse: Amazing work on evals! Now, time for big techs to work on improving their VLMs for multi-page document understanding (ins….

0

2

0

Tony Wu

@tonywu_71

3 months

RT @AymericRoucher: Today we release ScreenSuite, the most comprehensive evaluation suite for GUI agents (aka Computer Use agents). We pack….

0

13

0

Tony Wu

@tonywu_71

3 months

RT @ManuelFaysse: Reducing ColPali / ColQwen index size is super valuable in many use case, and I know many people who tried and couldn't b….

0

17

0

Tony Wu

@tonywu_71

3 months

RT @Dorialexander: Announcing the release of the official Common Corpus paper: a 20 page report detailing how we collected, processed and p….

0

96

0

Tony Wu

@tonywu_71

3 months

RT @Dorialexander: Been calling it for some time, though I'm very skeptical about the super long context part: small models have constraine….

0

7

0

Tony Wu

@tonywu_71

3 months

Nice read from my friend Max about an elegant way to solve the lack of context in textual information retrieval!.

Max Conti

@mlpc123

3 months

🕺Super happy to release our latest work with @ManuelFaysse: in our paper "Context Is Gold to Find the Gold Passage", we share all our findings on how to train embedding models to meaningfully include doc-wide context into chunks - leading to convincing results! 🧑‍🍳 🧵1/N

0

2

Tony Wu

@tonywu_71

3 months

RT @Dorialexander: Mid-training coming to web visual parsing: mix of crawl, synthetic web data and simulated web traces at scale. https://t….

0

1

0

Tony Wu

@tonywu_71

3 months

🚀 Another open-source drop! Our team at @hcompany_ai is open-sourcing Holo-1 👀, our action-oriented VLM for web navigation — and dropping a new benchmark, WebClick 🌐, to push the field forward!. More details in our technical report: 📄

H

@hcompany_ai

3 months

2️⃣ Holo-1: We are Open-Sourcing our Visual-Language Model powering Surfer H . We’ve beefed up Surfer H’s web automation with Holo-1, our 3B & 7B-parameters Action Models. It now achieves industry-leading UI localization and navigation accuracy while staying compact and

1

8

55