Ouail Kitouni @WKitouni X Profile

Ouail Kitouni

@WKitouni

Followers

65

Following

56

Media

16

Statuses

110

Member of technical staff @Anthropic prev @MIT @Meta @MSFTResearch

San Francisco, CA

Joined August 2019

Don't wanna be here? Send us removal request.

Ouail Kitouni

@WKitouni

1 year

RT @alexalbert__: Friday feature drop:. Highlight text or code within an Artifact and quickly have Claude improve or explain the selection.….

0

67

0

Ouail Kitouni

@WKitouni

1 year

We eating good tonight

0

1

Ouail Kitouni

@WKitouni

1 year

RT @teortaxesTex: Thesis from @ilyasut : "to predict the next word, you have to predict the world".Antithesis from @ylecun : "AR-LLMs suck!….

0

21

0

Ouail Kitouni

@WKitouni

1 year

You can just use a different model to prioritize higher signal tokens and generalize quicker. RHO-LOSS literally just works. (fraction is ratio of top-k tokens kept to total tokens; 1 is equiv to no rho-loss used)

2

1

12

Ouail Kitouni

@WKitouni

1 year

Empire State dragon??

0

Ouail Kitouni

@WKitouni

1 year

Interesting future directions could be dynamic horizon selection (it’s much more difficult to predict the far future than it is to predict next token) so how do we interpolate from next-token to full on any-to-any effectively?.

0

2

Ouail Kitouni

@WKitouni

1 year

This simple change on top of BERT’s MLM makes the model a masked diffusion model on discrete states which has a nice correspondence with Permutation Language Modeling. PLM was notoriously difficult to train because permuted sequences are much harder to predict.

1

0

1

Ouail Kitouni

@WKitouni

1 year

A simple change to what the model sees as input/target (the specific factorization the objective aims to optimize) resolves the reversal curse and allows a model to learn star-graph navigation (a task difficult to learn without changing the data)!.

1

0

Ouail Kitouni

@WKitouni

1 year

What if I told you they could store more if you tweak good ol’ MLM to something more modern like masked diffusion? What if I also told you it could help the model **plan**? eg. when you ask models to make predictions over longer horizons, they learn pathfinding on graphs

1

0

Ouail Kitouni

@WKitouni

1 year

[🚨Masked Diffusion vs. GPT🚨].Don't predict next-token only, predict any-to-any. You'll get:.- Better knowledge storage.- No reversal curse.- Better planning.📄LLMs are good at storing information but not quite perfectly (hallucinations, reversal, etc).🧵

2

3

11

Ouail Kitouni

@WKitouni

1 year

RT @summeryue0: 🚀 Introducing the SEAL Leaderboards! We rank LLMs using private datasets that can’t be gamed. Vetted experts handle the rat….

0

34

0

Ouail Kitouni

@WKitouni

1 year

cc: @ericjmichaud_ @TrentonBricken @ZimingLiu11.

0

2

Ouail Kitouni

@WKitouni

1 year

5/ We also observe a correspondence between Principal Components and known terms in nuclear theory:

1

0

1

Ouail Kitouni

@WKitouni

1 year

4/ We observe a similar structured representations when training models on nuclear physics data. It turns out the model uses spirals as a geometric interpretation of the nuclear “liquid drop model”.

1

0

1

Ouail Kitouni

@WKitouni

1 year

3/ In previous work, we found transformers learn interpretable algorithms for modular addition. In some cases we even see these extremely human-readable highly-structured representations:

1

0

1

Ouail Kitouni

@WKitouni

1 year

2/ e.g., Can we study NN representations to (re)discover nuclear theory? We trained models on nuclear physics data and found that they learn representations strikingly similar to “human-derived” theory.

1

0

1

Ouail Kitouni

@WKitouni

1 year

1/ A lot of mech interp work lately focuses on understanding how language models work. A slightly different but fun question we wanted to explore in this paper Can interp say anything about models trained on scientific (specifically physics) data?.

1

0

1

Ouail Kitouni

@WKitouni

2 years

Repo to reproduce Grokking in a few lines of code (Full batch GD, small MLP, modular addition):

0

1

Ouail Kitouni

@WKitouni

2 years

Understanding the Pareto frontier will be key here. Also see:

0

Ouail Kitouni

@WKitouni

2 years

I think we'll see more such results as we confront a fundamental alignment issue: There's an irreducible tradeoff btwn helpfulness & harmlessness. A good model provides some harmful content for the greater good, while a terrible model is constrained, upholding unnecessary rules.

Anthropic

@AnthropicAI

2 years

New Anthropic Paper: Sleeper Agents. We trained LLMs to act secretly malicious. We found that, despite our best efforts at alignment training, deception still slipped through.

1

0

2