
Jascha Sohl-Dickstein
@jaschasd
Followers
23K
Following
1K
Media
75
Statuses
544
Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.
San Francisco
Joined August 2009
My first blog post ever! Be harsh, but, you know, constructive. Too much efficiency makes everything worse: overfitting and the strong version of Goodhart's law. đź§µ
42
187
987
This is great, hearing Yang's thought process and motivations for his score matching/diffusion research. (I had forgotten that I tried to convince him that score matching was too local to be useful for generative modeling :/).
Very excited to share our interview with @DrYangSong. This is Part 2 of our history of diffusion series — score matching, the SDE/ODE interpretation, consistency models, and more. Enjoy!
8
11
112
Slater is an excellent interviewer. This was a lot of fun to do. I'm even more excited for the upcoming interviews with @DrYangSong and @sedielem !.
Very excited to share our interview with @jaschasd on the history of diffusion models — from his original 2015 paper inventing them, to the GAN "ice age", to the resurgence in diffusion starting with DDPM. Enjoy!
1
6
72
This was one of the most research-enabling libraries I used at Google. If you want to try out LLM ideas with a simple, clean, JAX codebase, this is for you.
We recently open-sourced a relatively minimal implementation example of Transformer language model training in JAX, called NanoDO. If you stick to vanilla JAX components, the code is relatively straightforward to read -- the model file is <150 lines. We found it useful as a.
1
6
77
This was a fun project!. If you could train an LLM over text arithmetically compressed using a smaller LLM as a probabilistic model of text, it would be really good. Text would be represented with far fewer tokens, and inference would be way faster and cheaper. The hard part is.
Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. With @blester125, @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd.
3
10
103
Want to learn more?. Blog post: 3-page paper:
arxiv.org
Some fractals -- for instance those associated with the Mandelbrot and quadratic Julia sets -- are computed by iterating a function, and identifying the boundary between hyperparameters for which...
14
71
677
I'm running an experiment, and holding some public office hours (inspired by seeing @kchonyc do something similar). Come talk with me about anything! Ask for advice on your research or startup or career or I suppose personal life, brainstorm new research ideas, complain about.
6
9
142
An excellent project making evolution strategies much more efficient for computing gradients in dynamical systems.
📝Quiz time: when you have an unrolled computation graph (see figure below), how would you compute the unrolling parameters' gradients?. If your answer only contains Backprop, now it’s time to add a new method to your gradient estimation toolbox!
0
4
38
RT @mlbileschi_pub: 2+2=5?. “LLMs are not Robust to Adversarial Arithmetic” a new paper from our team @GoogleDeepMind with @bucketofkets, @….
0
11
0