nathanrs Profile Banner
Nathan Barry Profile
Nathan Barry

@nathanrs

Followers
3K
Following
31K
Media
52
Statuses
297

Man in the Arena Allocator. Prev @Apple, CS + Math @UTAustin, @zfellows

Austin, TX
Joined June 2020
Don't wanna be here? Send us removal request.
@nathanrs
Nathan Barry
2 months
Rewrote tiny-diffusion to be 3x smaller! Went from 951 lines to just 364, all contained in one file. As simple as possible, but not simpler. I also added a tiny GPT implementation as a comparison (312 lines, inspired by @karpathy). The two implementations are ~80% identical.
@nathanrs
Nathan Barry
5 months
Playing around with training a tiny 11M parameter character-level text diffusion model! It's a WIP but the code is currently a heavily modified nanochat gpt implementation (to change from autoregressive decoding to diffusion) and trained on the Tiny Shakespeare dataset. The
25
104
1K
@nathanrs
Nathan Barry
18 days
Diffusion LLMs are becoming very competitive architectures. But recently, there's also been a lot of progress in flow-based LLMs, which are conceptually similar. Both learn to transport samples from a noise distribution to a data distribution. Image generation used to be
@StefanoErmon
Stefano Ermon
18 days
Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting
3
8
129
@nmboffi
Nicholas Boffi
22 days
We just brought flow maps to language modeling for one-step sequence generation 💥 Discrete diffusion is not necessary -- continuous flows over one-hot encodings achieve SoTA performance and ≥8.3× faster generation 🔥 We believe this is a major step forward for discrete
4
45
249
@nathanrs
Nathan Barry
26 days
Our project won both @neo’s prize and the Most Creative Prize at @hackwithtrees Was fun working with @alexkranias, Pranav, and Lainey!
@nathanrs
Nathan Barry
26 days
Built a camera that transforms your photos with diffusion models and prints them instantly on receipt paper
9
1
58
@nathanrs
Nathan Barry
26 days
More photos
1
0
4
@nathanrs
Nathan Barry
26 days
Built a camera that transforms your photos with diffusion models and prints them instantly on receipt paper
7
1
33
@nathanrs
Nathan Barry
1 month
LLaDA 2.1 was released, a 100B parameter diffusion language model with self-correction capabilities. They are able to fix previous tokens by adopting a mixture of masking/state-absorption and uniform diffusion, similar to GIDD. In a previous post, I mentioned that Google Gemini
@ant_oss
Ant Open Source
1 month
What if an LLM could EDIT its own tokens in real-time, not just generate them? 🤯 Introducing LLaDA2.1 — a diffusion model that breaks from autoregressive dominance. It drafts fast, then fixes its own mistakes on the fly with Token-to-Token editing. The result? 892 tokens/sec on
0
3
34
@nathanrs
Nathan Barry
1 month
Was doing a deeper literature review over and found one of my new favorite paper title ever: “BERT has a Mouth, and It Must Speak” Was one of the earliest papers to do something akin to state-absorption diffusion language modeling.
0
6
93
@nathanrs
Nathan Barry
2 months
Created tiny-infini-gram, a training-free language model which can generate Shakespeare 250x faster than nanoGPT! Last year, I read about unbounded n-gram language models, which solve the exponential space problem for classical n-grams that made using large n intractable. By
14
26
290
@nathanrs
Nathan Barry
2 months
Most recent diffusion language model research (that I’ve seen) seems to be using masking as the noising process.

It looks like, however, most closed-source models (Google Gemini Diffusion and possibly Inception Labs’ Mercury) use a different noising process, where instead of
@nathanrs
Nathan Barry
3 months
Super interesting paper! Diffusion language models using random-token corruption (uniform diffusion) may scale better than masking. In both settings, the model learns to predict the original token from a corrupted token. The difference is that with masking, the corrupted token
7
34
332
@nathanrs
Nathan Barry
2 months
Good write up by my friend @alexkranias on how you can quantize integers into a custom floating point format to enable logarithmic histogram bucketing. This allows you to compress a histogram by 99.99% while only adding a <0.4% constant relative bounded error.
@alexkranias
Alex Kranias
2 months
A new blog post! Thought I'd dive into this cool streaming/approximation algorithms problem I encountered a few months ago. TLDR: we can create our own custom floating point representation for encoding and decoding integers and use this to index into a tiny 2D histogram,
1
5
53
@jm_alexia
Alexia Jolicoeur-Martineau
2 months
The hyperfitting paper found a good solution to greedy sampling getting stuck in loops. Overfit to a small dataset and suddently it wont loop while maintaining quality and diversity. Best paper of 2024 imo. https://t.co/EWKM31epUS
Tweet card summary image
arxiv.org
This paper introduces the counter-intuitive generalization results of overfitting pre-trained large language models (LLMs) on very small datasets. In the setting of open-ended text generation, it...
@DimitrisPapail
Dimitris Papailiopoulos
2 months
1/ New paper! "Wait, Wait, Wait… Why Do Reasoning Models Loop?" Under greedy/low-temp decoding, reasoning LLMs get stuck in loops repeating themselves, wasting test-time compute and sometimes never terminating! We study why this🔁 happens and why increasing temp is a band-aid
8
22
232
@nathanrs
Nathan Barry
2 months
tiny-diffusion, but Japanese! I wonder how logographic languages (Japanese, Chinese, etc) compare to phonetic/alphabetic languages in generation quality and speed with character-level tokenizers. The main difference is the semantic-value-per-token. Fewer tokens are needed to
@webbigdata
webbigdata
2 months
diffusion源氏物語 あけましておめでとうございます 昨年はAI/LLMは幾つか公開できましたが、作った後は「ダウンロード数が多いのでどこかで使われているっぽいが…?」止まりでした 今年は自分でSLMを製品化に耐えうるレベルまで品質向上してプロダクトへ展開します 本年もよろしくお願いします
13
47
406
@Dorialexander
Alexander Doria
2 months
So the first major paper of 2026, DeepSeek mHC: Manifold-Constrained Hyper-Connections. This is actually an engineering paper, taking as a starting points ideas already exposed in the original Hyper-Connections (HC) paper from ByteDance, which is consequently a prerequisite for
18
83
732