Artidoro Pagnoni Profile
Artidoro Pagnoni

@ArtidoroPagnoni

Followers
1K
Following
2K
Media
17
Statuses
349

PhD @uwnlp @AIatMeta. Bending the scaling laws.

Seattle, WA
Joined September 2015
Don't wanna be here? Send us removal request.
@ArtidoroPagnoni
Artidoro Pagnoni
1 year
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe
17
145
728
@abertsch72
Amanda Bertsch
17 days
Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
13
66
346
@gneubig
Graham Neubig
1 month
New method for automatic prompt optimization! TL;DR: We do reinforcement learning to train an LM that can take in a lot of task examples and generate a prompt that describes the task. Trained/tested on 3000+ classification datasets on Hugging Face!
@XiaoEmily41333
Emily Xiao
1 month
Can we train LLMs to be good prompt engineers? 🚀We propose Prompt-MII: Meta-Learning Instruction Induction for LLMs Our models out-perform strong baselines like ICL and GEPA with 13x fewer tokens. 🧵
9
48
359
@XiaochuangHan
Xiaochuang Han
2 months
Our team at Meta FAIR is hiring a PhD research intern for 2026. The topics broadly involve multimodal generative AI (e.g., video/image generation in addition to text), with flexible approaches across architecture/data/algorithms. Please apply via the link below, and feel free to
3
43
257
@__JohnNguyen__
John Nguyen
2 months
Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow
7
83
412
@ArtidoroPagnoni
Artidoro Pagnoni
2 months
BLT helps improve text-speech representation alignment and efficiency on speech!
@Yen_Ju_Lu
Yen-Ju Lu
2 months
🚀 Introducing the Latent Speech-Text Transformer (LST) — a speech-text model that organizes speech tokens into latent patches for better text→speech transfer, enabling steeper scaling laws and more efficient multimodal training ⚡️ Paper 📄 https://t.co/4nUsbC1YKF
0
0
5
@universeinanegg
Ari Holtzman
2 months
For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below
3
21
58
@minilek
Jelani Nelson
2 months
I’ve also been integrating LLMs into my research workflow. I spent most of Tuesday working on a problem I’ve been thinking about for a while with some collaborators. I had a conjecture on a possible way forward, and with some hours of thinking, mixing in conversations with Gemini
@SebastienBubeck
Sebastien Bubeck
2 months
Well, this time it's by Terence Tao himself: https://t.co/hFuWFLvoTC
14
18
290
@perceptroninc
Perceptron AI
2 months
1/ Introducing Isaac 0.1 — our first perceptive-language model. 2B params, open weights. Matches or beats models significantly larger on core perception. We are pushing the efficient frontier for physical AI. https://t.co/dJ1Wjh2ARK
24
121
603
@abertsch72
Amanda Bertsch
2 months
.@gneubig and I are co-teaching a new class on LM inference this fall! We designed this class to give a broad view on the space, from more classical decoding algorithms to recent methods for LLMs, plus a wide range of efficiency-focused work. website:
phontron.com
A class at Carnegie Mellon University on language model inference algorithms.
5
59
442
@gargighosh
Gargi Ghosh
3 months
New research from FAIR- Active Reading: a framework to learn a given set of material with self-generated learning strategies for generalized and expert domains(such as Finance). Absorb significantly more knowledge than vanilla finetuning and usual data augmentations strategies
@realJessyLin
Jessy Lin
3 months
@AIatMeta 🚀 We're excited to release the 1T Active Reading-augmented Wikipedia dataset and open-source the WikiExpert model for the community: Paper: https://t.co/0IGHROyMex Dataset: https://t.co/VEKKSMMvaa Model: https://t.co/QrQqSRfE9l Thanks to my great collaborators – @vinceberges,
0
11
28
@Aleph__Alpha
Aleph Alpha
3 months
Introducing two new tokenizer-free LLM checkpoints from our research lab: TFree-HAT 7B Built on our Hierarchical Autoregressive Transformer (HAT) architecture, these models achieve top-tier German and English performance while processing text on a UTF-8 byte level.
16
46
444
@ArtidoroPagnoni
Artidoro Pagnoni
3 months
We observed bias in the QLoRA evals. Really cools to see a proper study on this topic!
@EvaSpiliop
Eva Spiliopoulou
3 months
LLMs: great at judging… until it’s their own homework. 📚🔥So we built the math to call them out 🤷‍♀️ To learn more, check out our new paper: Play Favorites: A statistical method to quantify self-bias in LLM-as-a-judge 🎭 📄 Paper:
0
0
5
@allen_ai
Ai2
3 months
With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡
36
80
752
@Tim_Dettmers
Tim Dettmers
4 months
Thank you @google for the ML and Systems Junior Faculty Award! This award is for work on sparsity, and I am excited to continue this work focusing on mixture of experts. We might bring big MoEs to small GPUs quite soon! Stay tuned! Read more here:
Tweet card summary image
cs.cmu.edu
SCS faculty member Tim Dettmers has received an inaugural Google ML and Systems Junior Faculty Award.
37
25
496
@ArtidoroPagnoni
Artidoro Pagnoni
4 months
Thrilled to share that our Byte Latent Transformer won an Outstanding Paper Award at ACL 2025! 🏆
@ArtidoroPagnoni
Artidoro Pagnoni
1 year
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe
16
31
282
@JonathanHayase
Jonathan Hayase @ COLM
4 months
Do you ever wish all LLMs used the same tokenizer?🧑‍🤝‍🧑 We present an *efficient, lossless* method to convert any LM into a byte-level model at inference time. This fixes weird tokenization artifacts at the prompt boundary and enables ensembles of LMs with mismatched tokenizers! 🧵
3
33
175
@PontiEdoardo
Edoardo Ponti
4 months
Thanks for acknowledging Dynamic Token Pooling as a predecessor to H-Net, @_albertgu! We had some decent ideas in that paper (e2e and entropy-based tokenisation), but it surprises me that it took 2 years (an eternity in NLP) to find the right recipe and scale better than BPE
@_albertgu
Albert Gu
5 months
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
1
12
87
@orevaahia
Oreva Ahia
5 months
🎉 We’re excited to introduce BLAB: Brutally Long Audio Bench, the first benchmark for evaluating long-form reasoning in audio LMs across 8 challenging tasks, using 833+ hours of Creative Commons audio. (avg length: 51 minutes).
2
50
178
@sukjun_hwang
Sukjun (June) Hwang
5 months
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
98
754
5K