Nathan Barry
@nathanrs
Followers
3K
Following
31K
Media
52
Statuses
297
Man in the Arena Allocator. Prev @Apple, CS + Math @UTAustin, @zfellows
Austin, TX
Joined June 2020
Rewrote tiny-diffusion to be 3x smaller! Went from 951 lines to just 364, all contained in one file. As simple as possible, but not simpler. I also added a tiny GPT implementation as a comparison (312 lines, inspired by @karpathy). The two implementations are ~80% identical.
Playing around with training a tiny 11M parameter character-level text diffusion model! It's a WIP but the code is currently a heavily modified nanochat gpt implementation (to change from autoregressive decoding to diffusion) and trained on the Tiny Shakespeare dataset. The
25
104
1K
Diffusion LLMs are becoming very competitive architectures. But recently, there's also been a lot of progress in flow-based LLMs, which are conceptually similar. Both learn to transport samples from a noise distribution to a data distribution. Image generation used to be
Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting
3
8
129
We just brought flow maps to language modeling for one-step sequence generation 💥 Discrete diffusion is not necessary -- continuous flows over one-hot encodings achieve SoTA performance and ≥8.3× faster generation 🔥 We believe this is a major step forward for discrete
4
45
249
Our project won both @neo’s prize and the Most Creative Prize at @hackwithtrees Was fun working with @alexkranias, Pranav, and Lainey!
Built a camera that transforms your photos with diffusion models and prints them instantly on receipt paper
9
1
58
Built a camera that transforms your photos with diffusion models and prints them instantly on receipt paper
7
1
33
LLaDA 2.1 was released, a 100B parameter diffusion language model with self-correction capabilities. They are able to fix previous tokens by adopting a mixture of masking/state-absorption and uniform diffusion, similar to GIDD. In a previous post, I mentioned that Google Gemini
What if an LLM could EDIT its own tokens in real-time, not just generate them? 🤯 Introducing LLaDA2.1 — a diffusion model that breaks from autoregressive dominance. It drafts fast, then fixes its own mistakes on the fly with Token-to-Token editing. The result? 892 tokens/sec on
0
3
34
Was doing a deeper literature review over and found one of my new favorite paper title ever: “BERT has a Mouth, and It Must Speak” Was one of the earliest papers to do something akin to state-absorption diffusion language modeling.
0
6
93
Created tiny-infini-gram, a training-free language model which can generate Shakespeare 250x faster than nanoGPT! Last year, I read about unbounded n-gram language models, which solve the exponential space problem for classical n-grams that made using large n intractable. By
14
26
290
Most recent diffusion language model research (that I’ve seen) seems to be using masking as the noising process.
It looks like, however, most closed-source models (Google Gemini Diffusion and possibly Inception Labs’ Mercury) use a different noising process, where instead of
Super interesting paper! Diffusion language models using random-token corruption (uniform diffusion) may scale better than masking. In both settings, the model learns to predict the original token from a corrupted token. The difference is that with masking, the corrupted token
7
34
332
Good write up by my friend @alexkranias on how you can quantize integers into a custom floating point format to enable logarithmic histogram bucketing. This allows you to compress a histogram by 99.99% while only adding a <0.4% constant relative bounded error.
A new blog post! Thought I'd dive into this cool streaming/approximation algorithms problem I encountered a few months ago. TLDR: we can create our own custom floating point representation for encoding and decoding integers and use this to index into a tiny 2D histogram,
1
5
53
The hyperfitting paper found a good solution to greedy sampling getting stuck in loops. Overfit to a small dataset and suddently it wont loop while maintaining quality and diversity. Best paper of 2024 imo. https://t.co/EWKM31epUS
arxiv.org
This paper introduces the counter-intuitive generalization results of overfitting pre-trained large language models (LLMs) on very small datasets. In the setting of open-ended text generation, it...
1/ New paper! "Wait, Wait, Wait… Why Do Reasoning Models Loop?" Under greedy/low-temp decoding, reasoning LLMs get stuck in loops repeating themselves, wasting test-time compute and sometimes never terminating! We study why this🔁 happens and why increasing temp is a band-aid
8
22
232
tiny-diffusion, but Japanese! I wonder how logographic languages (Japanese, Chinese, etc) compare to phonetic/alphabetic languages in generation quality and speed with character-level tokenizers. The main difference is the semantic-value-per-token. Fewer tokens are needed to
diffusion源氏物語 あけましておめでとうございます 昨年はAI/LLMは幾つか公開できましたが、作った後は「ダウンロード数が多いのでどこかで使われているっぽいが…?」止まりでした 今年は自分でSLMを製品化に耐えうるレベルまで品質向上してプロダクトへ展開します 本年もよろしくお願いします
13
47
406
So the first major paper of 2026, DeepSeek mHC: Manifold-Constrained Hyper-Connections. This is actually an engineering paper, taking as a starting points ideas already exposed in the original Hyper-Connections (HC) paper from ByteDance, which is consequently a prerequisite for
18
83
732