Thomas Thoresen @thomas_thoresen X Profile

Thomas Thoresen

@thomas_thoresen

Followers

276

Following

2K

Media

51

Statuses

319

Father, athlete, coder. Working on @vespaengine

Joined May 2022

Don't wanna be here? Send us removal request.

abhishek

@abhi1thakur

2 days

Explaining BM25 in one post. BM25: How relevant is this document to this query? 3 parts: > IDF: how rare a query token is across all documents. >TF: how frequent query token is across all documents. > Length normalization: how long is current doc compared to all docs Parameters:

7

42

276

Thomas Thoresen

@thomas_thoresen

6 days

great move - I hate coding outside the IDE modal notebooks wen @bernhardsson solveit wen @jeremyphoward 🙏

0

Thomas Thoresen

@thomas_thoresen

6 days

Some valuable skills to be taught in this series

abhishek

@abhi1thakur

6 days

First video of the series is up! You’ll learn how to: • Create a Vespa application package • Enable BM25 scoring • Spin up Vespa inside Docker • Feed documents into Vespa from Hugging Face • Run BM25-ranked search queries

0

1

3

abhishek

@abhi1thakur

7 days

In the next few weeks, you will learn how to build a production grade search in a stepwise manner. Starting from basic bm25 to embeddings, hybrid search to a fully fledged RAG application that works on millions (and billions) of documents. Keep an eye out on my youtube channel!

8

10

106

Thomas Thoresen

@thomas_thoresen

14 days

Start small, increase incrementally, show up. Learn, adjust, repeat. Apply to anything.

0

2

Thomas Thoresen

@thomas_thoresen

16 days

tfw someone you've been following for almost a decade, bought the book of, and is 4xkaggle GM joins your company

abhishek

@abhi1thakur

16 days

Let's make search great again 🚀 I'm joining Vespa ai and my role is to help you build a world class search, RAG and recommendation systems. In addition, I will be: 👨🏽‍💻 Taking part in open-source communities and contributions 👀 Creating tutorials to learn how to build search

6

1

26

Thomas Thoresen

@thomas_thoresen

20 days

@p0 @vespaengine

0

Thomas Thoresen

@thomas_thoresen

20 days

@p0 @vespaengine Many due to returning html, not txt/md

0

1

Thomas Thoresen

@thomas_thoresen

20 days

Surprisingly many invalid ones

0

Thomas Thoresen

@thomas_thoresen

20 days

Check out https://t.co/BA8ENtFOZU by @p0 Autogenerated mcp for repos with llms.txt (@vespaengine has one of course)

3

1

7

Jeremy Howard

@jeremyphoward

20 days

I spoke with @clattner_llvm about AI and software craftsmanship. "The question is, when things settle out, where do you as a programmer stand? Have you lost years of your own development because you’ve been spending it the wrong way?” A must read: https://t.co/AClwM7H6OB

fast.ai

Chris Lattner on software craftsmanship and AI

16

80

668

Jeremy Howard

@jeremyphoward

22 days

Lotta folks making this bet -- but most don't even realise they're making it. Will be interesting to see how many companies in 2 years time have gone under as a result of tech debt bankruptcy.

Josh McGrath

@j_mcgraph

23 days

OpenAI research is so AGI pilled we bet our whole codebase that we’ll hit superhuman coding before tech debt bankruptcy

11

12

140

Thomas Thoresen

@thomas_thoresen

29 days

👀 🤗

0

1

Thomas Thoresen

@thomas_thoresen

1 month

This coming from a 4xkaggle GM 🤩

abhishek

@abhi1thakur

1 month

I've been looking for how search and RAG can be done on large scale and actual data, and there's just toy examples everywhere I look. Not just some pdfs or a website with everything in context, but actual search, retrieval, ranking, re-ranking, etc. Then I found this goldmine.

4

0

7

ThePrimeagen

@ThePrimeagen

1 month

it is currently easier to install arch Linux than either OSX or Windows Hyprland, one polish kid, has made a better looking desktop experience than multi-trillion dollar Apple or Microsoft could it really might be the year of the Linux desktop

354

421

8K

Jeff Huber

@jeffreyhuber

2 months

im so tired of the 30-page ai-slop brain-rot around RAG here - i solved it for you. this simple graphic tells you everything you need to know.

Didier Lopes

@didier_lopes

2 months

I wrote 6 months ago that RAG might be dead. That was after an aha moment with Gemini’s 1M context window - running 200+ page docs through it and being impressed with the accuracy. But I didn’t have skin in the game. @nicbstme does. He just published an excellent piece: "The

36

76

937

Thomas Thoresen

@thomas_thoresen

2 months

Now, this got my 10x more excited than the latest LLM-releases! A visual retriever that runs on CPU beating ColPali in performance 🤩

paul

@pteiletche

2 months

Introducing ModernVBERT: a vision-language encoder that matches the performance of models 10× its size on visual document retrieval tasks! 👁️ Read more in the thread👇 (1/N)

0

5

Jon Bratseth

@jonbratseth

2 months

Lots of hard problems in web search, but luckily at least the "super fancy db" you need for the index is available for everyone at https://t.co/QfFhnHgki7.

Aravind Srinivas

@AravSrinivas

2 months

Why it's hard to build a web index, objectively harder than building a GPT-4.1. Argument: there are just fewer people - literally two (G and M) - who have done it well.

0

2

10

Thomas Thoresen

@thomas_thoresen

2 months

Introducing layered ranking for RAG applications | Vespa Blog

blog.vespa.ai

Introducing layered ranking: The missing piece for context engineering at scale.

0

Thomas Thoresen

@thomas_thoresen

2 months

Much talk about context rot in timeline. The solution: layered ranking and chunk selection.

1