Thomas Thoresen
@thomas_thoresen
Followers
276
Following
2K
Media
51
Statuses
319
Father, athlete, coder. Working on @vespaengine
Joined May 2022
Explaining BM25 in one post. BM25: How relevant is this document to this query? 3 parts: > IDF: how rare a query token is across all documents. >TF: how frequent query token is across all documents. > Length normalization: how long is current doc compared to all docs Parameters:
7
42
276
great move - I hate coding outside the IDE modal notebooks wen @bernhardsson solveit wen @jeremyphoward 🙏
0
0
0
In the next few weeks, you will learn how to build a production grade search in a stepwise manner. Starting from basic bm25 to embeddings, hybrid search to a fully fledged RAG application that works on millions (and billions) of documents. Keep an eye out on my youtube channel!
8
10
106
Start small, increase incrementally, show up. Learn, adjust, repeat. Apply to anything.
0
0
2
tfw someone you've been following for almost a decade, bought the book of, and is 4xkaggle GM joins your company
Let's make search great again 🚀 I'm joining Vespa ai and my role is to help you build a world class search, RAG and recommendation systems. In addition, I will be: 👨🏽💻 Taking part in open-source communities and contributions 👀 Creating tutorials to learn how to build search
6
1
26
Check out https://t.co/BA8ENtFOZU by @p0 Autogenerated mcp for repos with llms.txt (@vespaengine has one of course)
3
1
7
I spoke with @clattner_llvm about AI and software craftsmanship. "The question is, when things settle out, where do you as a programmer stand? Have you lost years of your own development because you’ve been spending it the wrong way?” A must read: https://t.co/AClwM7H6OB
fast.ai
Chris Lattner on software craftsmanship and AI
16
80
668
Lotta folks making this bet -- but most don't even realise they're making it. Will be interesting to see how many companies in 2 years time have gone under as a result of tech debt bankruptcy.
OpenAI research is so AGI pilled we bet our whole codebase that we’ll hit superhuman coding before tech debt bankruptcy
11
12
140
This coming from a 4xkaggle GM 🤩
I've been looking for how search and RAG can be done on large scale and actual data, and there's just toy examples everywhere I look. Not just some pdfs or a website with everything in context, but actual search, retrieval, ranking, re-ranking, etc. Then I found this goldmine.
4
0
7
it is currently easier to install arch Linux than either OSX or Windows Hyprland, one polish kid, has made a better looking desktop experience than multi-trillion dollar Apple or Microsoft could it really might be the year of the Linux desktop
354
421
8K
im so tired of the 30-page ai-slop brain-rot around RAG here - i solved it for you. this simple graphic tells you everything you need to know.
I wrote 6 months ago that RAG might be dead. That was after an aha moment with Gemini’s 1M context window - running 200+ page docs through it and being impressed with the accuracy. But I didn’t have skin in the game. @nicbstme does. He just published an excellent piece: "The
36
76
937
Now, this got my 10x more excited than the latest LLM-releases! A visual retriever that runs on CPU beating ColPali in performance 🤩
Introducing ModernVBERT: a vision-language encoder that matches the performance of models 10× its size on visual document retrieval tasks! 👁️ Read more in the thread👇 (1/N)
0
0
5
Lots of hard problems in web search, but luckily at least the "super fancy db" you need for the index is available for everyone at https://t.co/QfFhnHgki7.
Why it's hard to build a web index, objectively harder than building a GPT-4.1. Argument: there are just fewer people - literally two (G and M) - who have done it well.
0
2
10
Introducing layered ranking for RAG applications | Vespa Blog
blog.vespa.ai
Introducing layered ranking: The missing piece for context engineering at scale.
0
0
0
Much talk about context rot in timeline. The solution: layered ranking and chunk selection.
1
1
1