SID @SID_AI X Profile

SID

@SID_AI

Followers

827

Following

544

Media

4

Statuses

77

solving retrieval one model at a time | @ycombinator

https://t.co/CEpZZe6bPD

San Francisco, CA

Joined December 2022

Don't wanna be here? Send us removal request.

SID

@SID_AI

24 days

we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and

18

37

372

Max Rumpf

@maxrumpf

10 days

We improve both pass@1 AND pass@n during training. The issue is that lots of claimants: 1) train on domains with heavy mid/posttraining in the base models (math) 2) don't train for very long In many of these small-scale experiments, gains come from re-learning the format

Sasha Rush

@srush_nlp

11 days

There is significant discussion in the academic literature about RL making models better at pass@1 and *worse* at pass@N (or related claims). We run a lot of RL runs at Cursor and don't see this issue systematically. Not doubting it occurs, but something else might be going on.

7

9

99

Max Rumpf

@maxrumpf

10 days

Label noise really matters in RL SID-1's task requires reporting the documents most likely to contain the answer to a question. When the ground truth data contains errors, the model will start overreporting in hopes of catching spurious targets. For one public dataset where

SID

@SID_AI

24 days

we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and

0

1

23

Max Rumpf

@maxrumpf

13 days

Most RL frameworks are fundamentally unstable. We wasted more H100 hours on debugging this than any other issue fornour multi-turn, multi-env RL run (below). When using OpenAI-style messages for env interactions, parsing and retokenizing leads to subtly different tokens. This

SID

@SID_AI

24 days

we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and

18

56

544

Max Rumpf

@maxrumpf

16 days

Good RL environments are much richer than you think. We evaluate training for 100 epochs and see eval reward increase steadily. Partly, this is because our RL setting allows obfuscating the answer between epochs, largely mitigating memorization (when inspecting train rollouts).

SID

@SID_AI

24 days

we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and

6

18

146

Max Rumpf

@maxrumpf

17 days

We believe retrieval is the ideal playground for self-play RL on LLMs. SID-1 was trained with "pseudo self-play:" New questions were generated to cover gaps in model behavior as training progressed. We think we're not far away from closing that loop: Generating hard, verifiable

SID

@SID_AI

24 days

we just released our first model: SID-1 it's designed to be extremely good at only one task: retrieval. it has 1.8x better recall than embedding search alone (even with reranking) and beats "agentic" retrieval implemented using all frontier LLMs, including the really large and

6

11

71

SID

@SID_AI

24 days

SID-1: Tech Report It has way more detail than is prudent. https://t.co/GOdo8HFBh3

sid.ai

Current retrieval practice relies on a single-step pipeline where multiple small, specialized models prompt search tools and rerank documents. In agentic retrieval, a single model performs all these...

1

2

10

Max Rumpf

@maxrumpf

1 year

mediocrity is contagious and terminal

1

6

Max Rumpf

@maxrumpf

1 year

computer use and code gen progress is outpacing general intelligence improvements. why? you can easily create synthetic data for both. let me explain: if you have *more* high-quality data on a task, a model trained on that data will be better at it. currently, that data is

0

4

16

Max Rumpf

@maxrumpf

2 years

three letter AI companies that start with S are really having their day

1

4

Max Rumpf

@maxrumpf

2 years

give me an nvidia-smi -L long enough and i shall move the world

0

1

3

Max Rumpf

@maxrumpf

2 years

AMDAHL'S ARGUMENT FOR AI The productivity speedup AI apps can provide is limited by how much human-in-the-loop work is required. Humans are ~1-3 tokens per second. They can't really be sped up – unless you're @neuralink. So if your application requires a human completion for

7

17

72

Max Rumpf

@maxrumpf

2 years

we call this a "Context Engine" internally. you want a service to give the model definitions and extra information about the request at hand, without forcing users into providing it manually every time. humans are really good at inferring context from past interactions, the

Garry Tan

@garrytan

2 years

Vector databases are not going away. Large context windows and RAG co-exist, and the way they interact is actually MEMORY. Increasingly you're going to need structured representations of knowledge/info that you build into your workflow and insert into the context window, drawn

1

2

9

Max Rumpf

@maxrumpf

2 years

"Uncertainty routing" is the real news in the Gemini announcement. Without it, GPT-4 still beats Gemini in CoT@32 on MMLU! For people building apps: GPT-4 is still better in zero-shot. (charts from the Deepmind Gemini Technical Report)

0

5

22

Max Rumpf

@maxrumpf

2 years

.@karpathy love the pricing for the vision API! great way to reduce our @openai bill at @try_sid

5

2

24

Garrett Scott 🕳

@thegarrettscott

2 years

I still firmly believe the race to functional AGI is about adding personalized context about you, not better models. Hopefully that’s a focus for devday.

6

2

35

Max Rumpf

@maxrumpf

2 years

.@daltonc with the shades. yc reunion with @sama was fun yesterday.

3

9

109

Max Rumpf

@maxrumpf

2 years

FYI you can add personal apps & services to any LLM (Llama 2, @OpenAI etc.) with @try_sid using a single API. If you want to see OpenAI + apps & services (through SID), you can try the ChatGPT plugin we stealth launched last week (already has +500 users). Unlimited Drive &

Jack Krawczyk

@JackK

2 years

Six months in and we’re bringing you the best Bard yet: Bard now: - integrates with your personal apps & services - is the only language model to actively admit how confident it is in its response - admits when it made a mistake - expanded image input to 40+ languages - is

1

9

Lotte Seifert

@lotteseifert

2 years

Exactly what we're building @try_sid! But not just for one bot - but for EVERY AI tool out there! 😉

Nat Eliason

@nateliason

2 years

Has anyone made a good AI tool yet where I can load in everything I've ever written, every podcast I've recorded, every note I've taken from a book, and then have a ChatGPT-esque interface where I can query my past self? I'd probably pay hundreds of dollars a month for this.

1

11

Jerry Liu

@jerryjliu0

2 years

Here are 8 key considerations for building *production-grade* LLM apps over your data (RAG) 💡 (see 🧵): 1️⃣ Chunks used for retrieval shouldn’t necessarily be the same as chunks used for LLM synthesis (@md_rumpf) 2️⃣ Embeddings should live in a different latent space than what

16

111

596