Nathan Barry @nathanrs X Profile

Nathan Barry

@nathanrs

Followers

3K

Following

31K

Media

52

Statuses

297

Man in the Arena Allocator. Prev @Apple, CS + Math @UTAustin, @zfellows

https://t.co/0JZ5LflYni

Austin, TX

Joined June 2020

Don't wanna be here? Send us removal request.

Nathan Barry

@nathanrs

2 months

Rewrote tiny-diffusion to be 3x smaller! Went from 951 lines to just 364, all contained in one file. As simple as possible, but not simpler. I also added a tiny GPT implementation as a comparison (312 lines, inspired by @karpathy). The two implementations are ~80% identical.

Nathan Barry

@nathanrs

5 months

Playing around with training a tiny 11M parameter character-level text diffusion model! It's a WIP but the code is currently a heavily modified nanochat gpt implementation (to change from autoregressive decoding to diffusion) and trained on the Tiny Shakespeare dataset. The

25

104

1K

Nathan Barry

@nathanrs

18 days

Diffusion LLMs are becoming very competitive architectures. But recently, there's also been a lot of progress in flow-based LLMs, which are conceptually similar. Both learn to transport samples from a noise distribution to a data distribution. Image generation used to be

Stefano Ermon

@StefanoErmon

18 days

Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting

3

8

129

Nicholas Boffi

@nmboffi

22 days

We just brought flow maps to language modeling for one-step sequence generation 💥 Discrete diffusion is not necessary -- continuous flows over one-hot encodings achieve SoTA performance and ≥8.3× faster generation 🔥 We believe this is a major step forward for discrete

4

45

249

Nathan Barry

@nathanrs

26 days

Our project won both @neo’s prize and the Most Creative Prize at @hackwithtrees Was fun working with @alexkranias, Pranav, and Lainey!

Nathan Barry

@nathanrs

26 days

Built a camera that transforms your photos with diffusion models and prints them instantly on receipt paper

9

1

58

Nathan Barry

@nathanrs

26 days

Link to project: https://t.co/IxiMGgcu9E

diffuji.com

a diffusion-powered instant camera

0

6

Nathan Barry

@nathanrs

26 days

More photos

1

0

4

Nathan Barry

@nathanrs

26 days

Built a camera that transforms your photos with diffusion models and prints them instantly on receipt paper

7

1

33

Nathan Barry

@nathanrs

1 month

LLaDA 2.1 was released, a 100B parameter diffusion language model with self-correction capabilities. They are able to fix previous tokens by adopting a mixture of masking/state-absorption and uniform diffusion, similar to GIDD. In a previous post, I mentioned that Google Gemini

Ant Open Source

@ant_oss

1 month

What if an LLM could EDIT its own tokens in real-time, not just generate them? 🤯 Introducing LLaDA2.1 — a diffusion model that breaks from autoregressive dominance. It drafts fast, then fixes its own mistakes on the fly with Token-to-Token editing. The result? 892 tokens/sec on

0

3

34

Nathan Barry

@nathanrs

1 month

Was doing a deeper literature review over and found one of my new favorite paper title ever: “BERT has a Mouth, and It Must Speak” Was one of the earliest papers to do something akin to state-absorption diffusion language modeling.

0

6

93

Nathan Barry

@nathanrs

2 months

GitHub repo: https://t.co/vbEwwnXWJz

github.com

An unbounded n-gram language model on Tiny Shakespeare - nathan-barry/tiny-infini-gram

4

0

27

Nathan Barry

@nathanrs

2 months

Blog post: https://t.co/CdtHnvNFWb

nathan.rs

Learning how to generate Shakespeare has become the “Hello World” of language models.1 Recently, I’ve been messing with alternative language models (diffusion …

1

0

28

Nathan Barry

@nathanrs

2 months

Created tiny-infini-gram, a training-free language model which can generate Shakespeare 250x faster than nanoGPT! Last year, I read about unbounded n-gram language models, which solve the exponential space problem for classical n-grams that made using large n intractable. By

14

26

290

Nathan Barry

@nathanrs

2 months

Most recent diffusion language model research (that I’ve seen) seems to be using masking as the noising process.  It looks like, however, most closed-source models (Google Gemini Diffusion and possibly Inception Labs’ Mercury) use a different noising process, where instead of

Nathan Barry

@nathanrs

3 months

Super interesting paper! Diffusion language models using random-token corruption (uniform diffusion) may scale better than masking. In both settings, the model learns to predict the original token from a corrupted token. The difference is that with masking, the corrupted token

7

34

332

Nathan Barry

@nathanrs

2 months

Good write up by my friend @alexkranias on how you can quantize integers into a custom floating point format to enable logarithmic histogram bucketing. This allows you to compress a histogram by 99.99% while only adding a <0.4% constant relative bounded error.

Alex Kranias

@alexkranias

2 months

A new blog post! Thought I'd dive into this cool streaming/approximation algorithms problem I encountered a few months ago. TLDR: we can create our own custom floating point representation for encoding and decoding integers and use this to index into a tiny 2D histogram,

1

5

53

Alexia Jolicoeur-Martineau

@jm_alexia

2 months

The hyperfitting paper found a good solution to greedy sampling getting stuck in loops. Overfit to a small dataset and suddently it wont loop while maintaining quality and diversity. Best paper of 2024 imo. https://t.co/EWKM31epUS

arxiv.org

This paper introduces the counter-intuitive generalization results of overfitting pre-trained large language models (LLMs) on very small datasets. In the setting of open-ended text generation, it...

Dimitris Papailiopoulos

@DimitrisPapail

2 months

1/ New paper! "Wait, Wait, Wait… Why Do Reasoning Models Loop?" Under greedy/low-temp decoding, reasoning LLMs get stuck in loops repeating themselves, wasting test-time compute and sometimes never terminating! We study why this🔁 happens and why increasing temp is a band-aid

8

22

232

Nathan Barry

@nathanrs

2 months

tiny-diffusion, but Japanese! I wonder how logographic languages (Japanese, Chinese, etc) compare to phonetic/alphabetic languages in generation quality and speed with character-level tokenizers. The main difference is the semantic-value-per-token. Fewer tokens are needed to

webbigdata

@webbigdata

2 months

diffusion源氏物語あけましておめでとうございます昨年はAI/LLMは幾つか公開できましたが、作った後は「ダウンロード数が多いのでどこかで使われているっぽいが…？」止まりでした今年は自分でSLMを製品化に耐えうるレベルまで品質向上してプロダクトへ展開します本年もよろしくお願いします

13

47

406

Alexander Doria

@Dorialexander

2 months

So the first major paper of 2026, DeepSeek mHC: Manifold-Constrained Hyper-Connections. This is actually an engineering paper, taking as a starting points ideas already exposed in the original Hyper-Connections (HC) paper from ByteDance, which is consequently a prerequisite for

18

83

732

Nathan Barry

@nathanrs

2 months

Link: https://t.co/ovAQTcN2OM

github.com

A character-level language diffusion model trained on Tiny Shakespeare - nathan-barry/tiny-diffusion

0

7

75