SID_AI Profile Banner
SID Profile
SID

@SID_AI

Followers
827
Following
544
Media
4
Statuses
77

solving retrieval one model at a time | @ycombinator

San Francisco, CA
Joined December 2022
Don't wanna be here? Send us removal request.
@SID_AI
SID
24 days
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
18
37
372
@maxrumpf
Max Rumpf
10 days
We improve both pass@1 AND pass@n during training. The issue is that lots of claimants: 1) train on domains with heavy mid/posttraining in the base models (math) 2) don't train for very long In many of these small-scale experiments, gains come from re-learning the format
@srush_nlp
Sasha Rush
11 days
There is significant discussion in the academic literature about RL making models better at pass@1 and *worse* at pass@N (or related claims). We run a lot of RL runs at Cursor and don't see this issue systematically. Not doubting it occurs, but something else might be going on.
7
9
99
@maxrumpf
Max Rumpf
10 days
Label noise really matters in RL SID-1's task requires reporting the documents most likely to contain the answer to a question. When the ground truth data contains errors, the model will start overreporting in hopes of catching spurious targets. For one public dataset where
@SID_AI
SID
24 days
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
0
1
23
@maxrumpf
Max Rumpf
13 days
Most RL frameworks are fundamentally unstable. We wasted more H100 hours on debugging this than any other issue fornour multi-turn, multi-env RL run (below). When using OpenAI-style messages for env interactions, parsing and retokenizing leads to subtly different tokens. This
@SID_AI
SID
24 days
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
18
56
544
@maxrumpf
Max Rumpf
16 days
Good RL environments are much richer than you think. We evaluate training for 100 epochs and see eval reward increase steadily. Partly, this is because our RL setting allows obfuscating the answer between epochs, largely mitigating memorization (when inspecting train rollouts).
@SID_AI
SID
24 days
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
6
18
146
@maxrumpf
Max Rumpf
17 days
We believe retrieval is the ideal playground for self-play RL on LLMs. SID-1 was trained with "pseudo self-play:" New questions were generated to cover gaps in model behavior as training progressed. We think we're not far away from closing that loop: Generating hard, verifiable
@SID_AI
SID
24 days
we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and
6
11
71
@maxrumpf
Max Rumpf
1 year
mediocrity is contagious and terminal
1
1
6
@maxrumpf
Max Rumpf
1 year
computer use and code gen progress is outpacing general intelligence improvements. why? you can easily create synthetic data for both. let me explain: if you have *more* high-quality data on a task, a model trained on that data will be better at it. currently, that data is
0
4
16
@maxrumpf
Max Rumpf
2 years
three letter AI companies that start with S are really having their day
1
1
4
@maxrumpf
Max Rumpf
2 years
give me an nvidia-smi -L long enough and i shall move the world
0
1
3
@maxrumpf
Max Rumpf
2 years
AMDAHL'S ARGUMENT FOR AI The productivity speedup AI apps can provide is limited by how much human-in-the-loop work is required. Humans are ~1-3 tokens per second. They can't really be sped up – unless you're @neuralink. So if your application requires a human completion for
7
17
72
@maxrumpf
Max Rumpf
2 years
we call this a "Context Engine" internally. you want a service to give the model definitions and extra information about the request at hand, without forcing users into providing it manually every time. humans are really good at inferring context from past interactions, the
@garrytan
Garry Tan
2 years
Vector databases are not going away. Large context windows and RAG co-exist, and the way they interact is actually MEMORY. Increasingly you're going to need structured representations of knowledge/info that you build into your workflow and insert into the context window, drawn
1
2
9
@maxrumpf
Max Rumpf
2 years
"Uncertainty routing" is the real news in the Gemini announcement. Without it, GPT-4 still beats Gemini in CoT@32 on MMLU! For people building apps: GPT-4 is still better in zero-shot. (charts from the Deepmind Gemini Technical Report)
0
5
22
@maxrumpf
Max Rumpf
2 years
.@karpathy love the pricing for the vision API! great way to reduce our @openai bill at @try_sid
5
2
24
@thegarrettscott
Garrett Scott 🕳
2 years
I still firmly believe the race to functional AGI is about adding personalized context about you, not better models. Hopefully that’s a focus for devday.
6
2
35
@maxrumpf
Max Rumpf
2 years
.@daltonc with the shades. yc reunion with @sama was fun yesterday.
3
9
109
@maxrumpf
Max Rumpf
2 years
FYI you can add personal apps & services to any LLM (Llama 2, @OpenAI etc.) with @try_sid using a single API. If you want to see OpenAI + apps & services (through SID), you can try the ChatGPT plugin we stealth launched last week (already has +500 users). Unlimited Drive &
@JackK
Jack Krawczyk
2 years
Six months in and we’re bringing you the best Bard yet: Bard now: - integrates with your personal apps & services - is the only language model to actively admit how confident it is in its response - admits when it made a mistake - expanded image input to 40+ languages - is
1
1
9
@lotteseifert
Lotte Seifert
2 years
Exactly what we're building @try_sid! But not just for one bot - but for EVERY AI tool out there! 😉
@nateliason
Nat Eliason
2 years
Has anyone made a good AI tool yet where I can load in everything I've ever written, every podcast I've recorded, every note I've taken from a book, and then have a ChatGPT-esque interface where I can query my past self? I'd probably pay hundreds of dollars a month for this.
1
1
11
@jerryjliu0
Jerry Liu
2 years
Here are 8 key considerations for building *production-grade* LLM apps over your data (RAG) 💡 (see 🧵): 1️⃣ Chunks used for retrieval shouldn’t necessarily be the same as chunks used for LLM synthesis (@md_rumpf) 2️⃣ Embeddings should live in a different latent space than what
16
111
596