thomas_thoresen Profile Banner
Thomas Thoresen Profile
Thomas Thoresen

@thomas_thoresen

Followers
276
Following
2K
Media
51
Statuses
319

Father, athlete, coder. Working on @vespaengine

Joined May 2022
Don't wanna be here? Send us removal request.
@abhi1thakur
abhishek
2 days
Explaining BM25 in one post. BM25: How relevant is this document to this query? 3 parts: > IDF: how rare a query token is across all documents. >TF: how frequent query token is across all documents. > Length normalization: how long is current doc compared to all docs Parameters:
7
42
276
@thomas_thoresen
Thomas Thoresen
6 days
great move - I hate coding outside the IDE modal notebooks wen @bernhardsson solveit wen @jeremyphoward 🙏
0
0
0
@thomas_thoresen
Thomas Thoresen
6 days
Some valuable skills to be taught in this series
@abhi1thakur
abhishek
6 days
First video of the series is up! You’ll learn how to: • Create a Vespa application package • Enable BM25 scoring • Spin up Vespa inside Docker • Feed documents into Vespa from Hugging Face • Run BM25-ranked search queries
0
1
3
@abhi1thakur
abhishek
7 days
In the next few weeks, you will learn how to build a production grade search in a stepwise manner. Starting from basic bm25 to embeddings, hybrid search to a fully fledged RAG application that works on millions (and billions) of documents. Keep an eye out on my youtube channel!
8
10
106
@thomas_thoresen
Thomas Thoresen
14 days
Start small, increase incrementally, show up. Learn, adjust, repeat. Apply to anything.
0
0
2
@thomas_thoresen
Thomas Thoresen
16 days
tfw someone you've been following for almost a decade, bought the book of, and is 4xkaggle GM joins your company
@abhi1thakur
abhishek
16 days
Let's make search great again 🚀 I'm joining Vespa ai and my role is to help you build a world class search, RAG and recommendation systems. In addition, I will be: 👨🏽‍💻 Taking part in open-source communities and contributions 👀 Creating tutorials to learn how to build search
6
1
26
@thomas_thoresen
Thomas Thoresen
20 days
0
0
0
@thomas_thoresen
Thomas Thoresen
20 days
@p0 @vespaengine Many due to returning html, not txt/md
0
0
1
@thomas_thoresen
Thomas Thoresen
20 days
Surprisingly many invalid ones
0
0
0
@thomas_thoresen
Thomas Thoresen
20 days
Check out https://t.co/BA8ENtFOZU by @p0 Autogenerated mcp for repos with llms.txt (@vespaengine has one of course)
3
1
7
@jeremyphoward
Jeremy Howard
20 days
I spoke with @clattner_llvm about AI and software craftsmanship. "The question is, when things settle out, where do you as a programmer stand? Have you lost years of your own development because you’ve been spending it the wrong way?” A must read: https://t.co/AClwM7H6OB
Tweet card summary image
fast.ai
Chris Lattner on software craftsmanship and AI
16
80
668
@jeremyphoward
Jeremy Howard
22 days
Lotta folks making this bet -- but most don't even realise they're making it. Will be interesting to see how many companies in 2 years time have gone under as a result of tech debt bankruptcy.
@j_mcgraph
Josh McGrath
23 days
OpenAI research is so AGI pilled we bet our whole codebase that we’ll hit superhuman coding before tech debt bankruptcy
11
12
140
@thomas_thoresen
Thomas Thoresen
29 days
👀 🤗
0
0
1
@thomas_thoresen
Thomas Thoresen
1 month
This coming from a 4xkaggle GM 🤩
@abhi1thakur
abhishek
1 month
I've been looking for how search and RAG can be done on large scale and actual data, and there's just toy examples everywhere I look. Not just some pdfs or a website with everything in context, but actual search, retrieval, ranking, re-ranking, etc. Then I found this goldmine.
4
0
7
@ThePrimeagen
ThePrimeagen
1 month
it is currently easier to install arch Linux than either OSX or Windows Hyprland, one polish kid, has made a better looking desktop experience than multi-trillion dollar Apple or Microsoft could it really might be the year of the Linux desktop
354
421
8K
@jeffreyhuber
Jeff Huber
2 months
im so tired of the 30-page ai-slop brain-rot around RAG here - i solved it for you. this simple graphic tells you everything you need to know.
@didier_lopes
Didier Lopes
2 months
I wrote 6 months ago that RAG might be dead. That was after an aha moment with Gemini’s 1M context window - running 200+ page docs through it and being impressed with the accuracy. But I didn’t have skin in the game. @nicbstme does. He just published an excellent piece: "The
36
76
937
@thomas_thoresen
Thomas Thoresen
2 months
Now, this got my 10x more excited than the latest LLM-releases! A visual retriever that runs on CPU beating ColPali in performance 🤩
@pteiletche
paul
2 months
Introducing ModernVBERT: a vision-language encoder that matches the performance of models 10× its size on visual document retrieval tasks! 👁️ Read more in the thread👇 (1/N)
0
0
5
@jonbratseth
Jon Bratseth
2 months
Lots of hard problems in web search, but luckily at least the "super fancy db" you need for the index is available for everyone at https://t.co/QfFhnHgki7.
@AravSrinivas
Aravind Srinivas
2 months
Why it's hard to build a web index, objectively harder than building a GPT-4.1. Argument: there are just fewer people - literally two (G and M) - who have done it well.
0
2
10
@thomas_thoresen
Thomas Thoresen
2 months
Introducing layered ranking for RAG applications | Vespa Blog
blog.vespa.ai
Introducing layered ranking: The missing piece for context engineering at scale.
0
0
0
@thomas_thoresen
Thomas Thoresen
2 months
Much talk about context rot in timeline. The solution: layered ranking and chunk selection.
1
1
1