SID
@SID_AI
Followers
827
Following
544
Media
4
Statuses
77
solving retrieval one model at a time | @ycombinator
San Francisco, CA
Joined December 2022
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
18
37
372
Label noise really matters in RL SID-1's task requires reporting the documents most likely to contain the answer to a question. When the ground truth data contains errors, the model will start overreporting in hopes of catching spurious targets. For one public dataset where
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
0
1
23
Most RL frameworks are fundamentally unstable. We wasted more H100 hours on debugging this than any other issue fornour multi-turn, multi-env RL run (below). When using OpenAI-style messages for env interactions, parsing and retokenizing leads to subtly different tokens. This
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
18
56
544
Good RL environments are much richer than you think. We evaluate training for 100 epochs and see eval reward increase steadily. Partly, this is because our RL setting allows obfuscating the answer between epochs, largely mitigating memorization (when inspecting train rollouts).
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
6
18
146
We believe retrieval is the ideal playground for self-play RL on LLMs. SID-1 was trained with "pseudo self-play:" New questions were generated to cover gaps in model behavior as training progressed. We think we're not far away from closing that loop: Generating hard, verifiable
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
6
11
71
SID-1: Tech Report It has way more detail than is prudent. https://t.co/GOdo8HFBh3
sid.ai
Current retrieval practice relies on a single-step pipeline where multiple small, specialized models prompt search tools and rerank documents. In agentic retrieval, a single model performs all these...
1
2
10
computer use and code gen progress is outpacing general intelligence improvements. why? you can easily create synthetic data for both. let me explain: if you have *more* high-quality data on a task, a model trained on that data will be better at it. currently, that data is
0
4
16
three letter AI companies that start with S are really having their day
1
1
4
AMDAHL'S ARGUMENT FOR AI The productivity speedup AI apps can provide is limited by how much human-in-the-loop work is required. Humans are ~1-3 tokens per second. They can't really be sped up – unless you're @neuralink. So if your application requires a human completion for
7
17
72
we call this a "Context Engine" internally. you want a service to give the model definitions and extra information about the request at hand, without forcing users into providing it manually every time. humans are really good at inferring context from past interactions, the
Vector databases are not going away. Large context windows and RAG co-exist, and the way they interact is actually MEMORY. Increasingly you're going to need structured representations of knowledge/info that you build into your workflow and insert into the context window, drawn
1
2
9
I still firmly believe the race to functional AGI is about adding personalized context about you, not better models. Hopefully that’s a focus for devday.
6
2
35
FYI you can add personal apps & services to any LLM (Llama 2, @OpenAI etc.) with @try_sid using a single API. If you want to see OpenAI + apps & services (through SID), you can try the ChatGPT plugin we stealth launched last week (already has +500 users). Unlimited Drive &
Six months in and we’re bringing you the best Bard yet: Bard now: - integrates with your personal apps & services - is the only language model to actively admit how confident it is in its response - admits when it made a mistake - expanded image input to 40+ languages - is
1
1
9
Exactly what we're building @try_sid! But not just for one bot - but for EVERY AI tool out there! 😉
Has anyone made a good AI tool yet where I can load in everything I've ever written, every podcast I've recorded, every note I've taken from a book, and then have a ChatGPT-esque interface where I can query my past self? I'd probably pay hundreds of dollars a month for this.
1
1
11
Here are 8 key considerations for building *production-grade* LLM apps over your data (RAG) 💡 (see 🧵): 1️⃣ Chunks used for retrieval shouldn’t necessarily be the same as chunks used for LLM synthesis (@md_rumpf) 2️⃣ Embeddings should live in a different latent space than what
16
111
596