Artidoro Pagnoni @ArtidoroPagnoni X Profile

Artidoro Pagnoni

@ArtidoroPagnoni

Followers

1K

Following

2K

Media

17

Statuses

349

PhD @uwnlp @AIatMeta. Bending the scaling laws.

Seattle, WA

Joined September 2015

Don't wanna be here? Send us removal request.

Artidoro Pagnoni

@ArtidoroPagnoni

1 year

🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe

17

145

728

Amanda Bertsch

@abertsch72

17 days

Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

13

66

346

Graham Neubig

@gneubig

1 month

New method for automatic prompt optimization! TL;DR: We do reinforcement learning to train an LM that can take in a lot of task examples and generate a prompt that describes the task. Trained/tested on 3000+ classification datasets on Hugging Face!

Emily Xiao

@XiaoEmily41333

1 month

Can we train LLMs to be good prompt engineers? 🚀We propose Prompt-MII: Meta-Learning Instruction Induction for LLMs Our models out-perform strong baselines like ICL and GEPA with 13x fewer tokens. 🧵

9

48

359

Xiaochuang Han

@XiaochuangHan

2 months

Our team at Meta FAIR is hiring a PhD research intern for 2026. The topics broadly involve multimodal generative AI (e.g., video/image generation in addition to text), with flexible approaches across architecture/data/algorithms. Please apply via the link below, and feel free to

3

43

257

John Nguyen

@__JohnNguyen__

2 months

Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow

7

83

412

Artidoro Pagnoni

@ArtidoroPagnoni

2 months

BLT helps improve text-speech representation alignment and efficiency on speech!

Yen-Ju Lu

@Yen_Ju_Lu

2 months

🚀 Introducing the Latent Speech-Text Transformer (LST) — a speech-text model that organizes speech tokens into latent patches for better text→speech transfer, enabling steeper scaling laws and more efficient multimodal training ⚡️ Paper 📄 https://t.co/4nUsbC1YKF

0

5

Ari Holtzman

@universeinanegg

2 months

For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below

3

21

58

Jelani Nelson

@minilek

2 months

I’ve also been integrating LLMs into my research workflow. I spent most of Tuesday working on a problem I’ve been thinking about for a while with some collaborators. I had a conjecture on a possible way forward, and with some hours of thinking, mixing in conversations with Gemini

Sebastien Bubeck

@SebastienBubeck

2 months

Well, this time it's by Terence Tao himself: https://t.co/hFuWFLvoTC

14

18

290

Perceptron AI

@perceptroninc

2 months

1/ Introducing Isaac 0.1 — our first perceptive-language model. 2B params, open weights. Matches or beats models significantly larger on core perception. We are pushing the efficient frontier for physical AI. https://t.co/dJ1Wjh2ARK

24

121

603

Amanda Bertsch

@abertsch72

2 months

.@gneubig and I are co-teaching a new class on LM inference this fall! We designed this class to give a broad view on the space, from more classical decoding algorithms to recent methods for LLMs, plus a wide range of efficiency-focused work. website:

phontron.com

A class at Carnegie Mellon University on language model inference algorithms.

5

59

442

Gargi Ghosh

@gargighosh

3 months

New research from FAIR- Active Reading: a framework to learn a given set of material with self-generated learning strategies for generalized and expert domains(such as Finance). Absorb significantly more knowledge than vanilla finetuning and usual data augmentations strategies

Jessy Lin

@realJessyLin

3 months

@AIatMeta 🚀 We're excited to release the 1T Active Reading-augmented Wikipedia dataset and open-source the WikiExpert model for the community: Paper: https://t.co/0IGHROyMex Dataset: https://t.co/VEKKSMMvaa Model: https://t.co/QrQqSRfE9l Thanks to my great collaborators – @vinceberges,

0

11

28

Aleph Alpha

@Aleph__Alpha

3 months

Introducing two new tokenizer-free LLM checkpoints from our research lab: TFree-HAT 7B Built on our Hierarchical Autoregressive Transformer (HAT) architecture, these models achieve top-tier German and English performance while processing text on a UTF-8 byte level.

16

46

444

Artidoro Pagnoni

@ArtidoroPagnoni

3 months

We observed bias in the QLoRA evals. Really cools to see a proper study on this topic!

Eva Spiliopoulou

@EvaSpiliop

3 months

LLMs: great at judging… until it’s their own homework. 📚🔥So we built the math to call them out 🤷‍♀️ To learn more, check out our new paper: Play Favorites: A statistical method to quantify self-bias in LLM-as-a-judge 🎭 📄 Paper:

0

5

Ai2

@allen_ai

3 months

With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡

36

80

752

Tim Dettmers

@Tim_Dettmers

4 months

Thank you @google for the ML and Systems Junior Faculty Award! This award is for work on sparsity, and I am excited to continue this work focusing on mixture of experts. We might bring big MoEs to small GPUs quite soon! Stay tuned! Read more here:

cs.cmu.edu

SCS faculty member Tim Dettmers has received an inaugural Google ML and Systems Junior Faculty Award.

37

25

496

Artidoro Pagnoni

@ArtidoroPagnoni

4 months

Thanks to the incredible team that made this possible! 🙌 @ArtidoroPagnoni @ramakanth1729 @EntilZhaPR @JohnNguyen @ben_mlr @margs_li @violet_zct @liliyu_lili @jaseweston @LukeZettlemoyer @gargighosh @ml_perception @universeinanegg @sriniiyer88 @AIatMeta

0

1

10

Artidoro Pagnoni

@ArtidoroPagnoni

4 months

Thrilled to share that our Byte Latent Transformer won an Outstanding Paper Award at ACL 2025! 🏆

Artidoro Pagnoni

@ArtidoroPagnoni

1 year

🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe

16

31

282

Jonathan Hayase @ COLM

@JonathanHayase

4 months

Do you ever wish all LLMs used the same tokenizer?🧑‍🤝‍🧑 We present an *efficient, lossless* method to convert any LM into a byte-level model at inference time. This fixes weird tokenization artifacts at the prompt boundary and enables ensembles of LMs with mismatched tokenizers! 🧵

3

33

175

Edoardo Ponti

@PontiEdoardo

4 months

Thanks for acknowledging Dynamic Token Pooling as a predecessor to H-Net, @_albertgu! We had some decent ideas in that paper (e2e and entropy-based tokenisation), but it surprises me that it took 2 years (an eternity in NLP) to find the right recipe and scale better than BPE

Albert Gu

@_albertgu

5 months

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

1

12

87

Oreva Ahia

@orevaahia

5 months

🎉 We’re excited to introduce BLAB: Brutally Long Audio Bench, the first benchmark for evaluating long-form reasoning in audio LMs across 8 challenging tasks, using 833+ hours of Creative Commons audio. (avg length: 51 minutes).

2

50

178

Sukjun (June) Hwang

@sukjun_hwang

5 months

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

98

754

5K