Artidoro Pagnoni
@ArtidoroPagnoni
Followers
1K
Following
2K
Media
17
Statuses
349
PhD @uwnlp @AIatMeta. Bending the scaling laws.
Seattle, WA
Joined September 2015
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe
17
145
728
Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
13
66
346
New method for automatic prompt optimization! TL;DR: We do reinforcement learning to train an LM that can take in a lot of task examples and generate a prompt that describes the task. Trained/tested on 3000+ classification datasets on Hugging Face!
Can we train LLMs to be good prompt engineers? 🚀We propose Prompt-MII: Meta-Learning Instruction Induction for LLMs Our models out-perform strong baselines like ICL and GEPA with 13x fewer tokens. 🧵
9
48
359
Our team at Meta FAIR is hiring a PhD research intern for 2026. The topics broadly involve multimodal generative AI (e.g., video/image generation in addition to text), with flexible approaches across architecture/data/algorithms. Please apply via the link below, and feel free to
3
43
257
Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow
7
83
412
BLT helps improve text-speech representation alignment and efficiency on speech!
🚀 Introducing the Latent Speech-Text Transformer (LST) — a speech-text model that organizes speech tokens into latent patches for better text→speech transfer, enabling steeper scaling laws and more efficient multimodal training ⚡️ Paper 📄 https://t.co/4nUsbC1YKF
0
0
5
For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below
3
21
58
I’ve also been integrating LLMs into my research workflow. I spent most of Tuesday working on a problem I’ve been thinking about for a while with some collaborators. I had a conjecture on a possible way forward, and with some hours of thinking, mixing in conversations with Gemini
14
18
290
1/ Introducing Isaac 0.1 — our first perceptive-language model. 2B params, open weights. Matches or beats models significantly larger on core perception. We are pushing the efficient frontier for physical AI. https://t.co/dJ1Wjh2ARK
24
121
603
.@gneubig and I are co-teaching a new class on LM inference this fall! We designed this class to give a broad view on the space, from more classical decoding algorithms to recent methods for LLMs, plus a wide range of efficiency-focused work. website:
phontron.com
A class at Carnegie Mellon University on language model inference algorithms.
5
59
442
New research from FAIR- Active Reading: a framework to learn a given set of material with self-generated learning strategies for generalized and expert domains(such as Finance). Absorb significantly more knowledge than vanilla finetuning and usual data augmentations strategies
@AIatMeta 🚀 We're excited to release the 1T Active Reading-augmented Wikipedia dataset and open-source the WikiExpert model for the community: Paper: https://t.co/0IGHROyMex Dataset: https://t.co/VEKKSMMvaa Model: https://t.co/QrQqSRfE9l Thanks to my great collaborators – @vinceberges,
0
11
28
Introducing two new tokenizer-free LLM checkpoints from our research lab: TFree-HAT 7B Built on our Hierarchical Autoregressive Transformer (HAT) architecture, these models achieve top-tier German and English performance while processing text on a UTF-8 byte level.
16
46
444
We observed bias in the QLoRA evals. Really cools to see a proper study on this topic!
LLMs: great at judging… until it’s their own homework. 📚🔥So we built the math to call them out 🤷♀️ To learn more, check out our new paper: Play Favorites: A statistical method to quantify self-bias in LLM-as-a-judge 🎭 📄 Paper:
0
0
5
Thank you @google for the ML and Systems Junior Faculty Award! This award is for work on sparsity, and I am excited to continue this work focusing on mixture of experts. We might bring big MoEs to small GPUs quite soon! Stay tuned! Read more here:
cs.cmu.edu
SCS faculty member Tim Dettmers has received an inaugural Google ML and Systems Junior Faculty Award.
37
25
496
Thanks to the incredible team that made this possible! 🙌 @ArtidoroPagnoni @ramakanth1729 @EntilZhaPR @JohnNguyen @ben_mlr @margs_li @violet_zct @liliyu_lili @jaseweston @LukeZettlemoyer @gargighosh @ml_perception @universeinanegg @sriniiyer88 @AIatMeta
0
1
10
Thrilled to share that our Byte Latent Transformer won an Outstanding Paper Award at ACL 2025! 🏆
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe
16
31
282
Do you ever wish all LLMs used the same tokenizer?🧑🤝🧑 We present an *efficient, lossless* method to convert any LM into a byte-level model at inference time. This fixes weird tokenization artifacts at the prompt boundary and enables ensembles of LMs with mismatched tokenizers! 🧵
3
33
175
Thanks for acknowledging Dynamic Token Pooling as a predecessor to H-Net, @_albertgu! We had some decent ideas in that paper (e2e and entropy-based tokenisation), but it surprises me that it took 2 years (an eternity in NLP) to find the right recipe and scale better than BPE
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
1
12
87
🎉 We’re excited to introduce BLAB: Brutally Long Audio Bench, the first benchmark for evaluating long-form reasoning in audio LMs across 8 challenging tasks, using 833+ hours of Creative Commons audio. (avg length: 51 minutes).
2
50
178
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
98
754
5K