Jacob Buckman
@jacobmbuckman
Followers
6K
Following
1K
Media
70
Statuses
2K
Founder @manifest__ai. PhD candidate @MILAMontreal. Formerly @jhuclsp, @GoogleAI, @SCSatCMU.
Manhattan, NY
Joined December 2016
We have trained Brumby-14B, the best RNN in history based on power retention. It's 100% attention free (because you don't need a hybrid if your RNN layer actually works). A lot of new LLM applications are about to be unlocked with cheap long context finetunes of Brumby!
3
1
13
We built the strongest attention-free base LLM, Brumby-14B-Base, for only $4,000. This is just a base model, but for the first time in 2 years, I'm seeing light at the end of the tunnel for the field to switch to an alternative architecture to transformers.
2
4
26
https://t.co/BOMBk4AEYv If I were @jefrankle, I'd be getting a bit nervous right about now :-)
Wager established. Jonathan Frankle (@jefrankle) stepped up to my Transformer long bet. https://t.co/NTy6avtZlc I'm counting on you. You only have 1700 days!
1
1
23
The key innovation enabling this result was our power attention layer, which we open-sourced a few weeks ago. Its similarity to attention enables cost-efficient reuse of weights, but the resulting arch has no quadratic-cost bottleneck. More info here:
manifestai.com
Releasing open-source code for Power Retention and accompanying research paper.
1
2
48
By retraining our architecture around the weights of existing (attention-based) models, we get an enormous boost to learning speed. We can catch up to the performance of frontier models at this scale in just a few thousand steps of training.
2
1
41
The end of the transformer era marches slowly closer: we trained a completely attention-free foundation model at the 14B scale for only $4,000. The performance matches other models of similar scale, including transformers and hybrid models.
Today we are releasing Brumby-14B-Base, the strongest attention-free base model around. https://t.co/mclQPFdOGa
44
93
995
Excited to share my recent interview on Recurrence and Attention for Long-Context Transformers with @samcharrington for the @twimlai podcast. Check it out! https://t.co/ezdkB1N9RM via @twimlai
twimlai.com
1
2
12
Really fun conversation on @latentspacepod!
We had @jacobmbuckman on @latentspacepod to talk about Power Retention, their new CUDA kernel library Vidrial, and how they retrained StarCoder to use it!
1
0
4
I knew what Carles and Jacob were working on for about ~1 year now. It is still extremely exciting to see these two graphs side by side📉📈
Today, we’re releasing Power Retention, a new architecture beyond Transformers. It enables LLMs to handle millions of tokens efficiently, unlocking long-context applications that were too costly before. https://t.co/Mdrgz3uBVX
0
2
8
Thank you @RashiShrivast18 for highlighting our work @manifest__ai!
The “dirty secret” in the field of AI is that it's humans that are making choices about what data AI models ingest to answer questions, not machines, Manifest AI cofounders said. The research lab claims its new model architecture can help fix that. https://t.co/QIYCxL9X8U
0
0
4
This is probably the most important architectural change to neural nets since the Transformer. It's time to train long-context models for everything :) Congrats @jacobmbuckman and everyone @manifest__ai!
Today, we’re releasing Power Retention, a new architecture beyond Transformers. It enables LLMs to handle millions of tokens efficiently, unlocking long-context applications that were too costly before. https://t.co/Mdrgz3uBVX
0
3
7
Discover more details, access the code, benchmarks, and read the full paper here:
manifestai.com
Releasing open-source code for Power Retention and accompanying research paper.
0
0
13
Power Retention represents a step change for the field. We've done nothing but exploit scale for the past five years; it's time to rethink the fundamentals. No more hacks, patches, or RAG. The long context era is about to begin. Are you ready?
1
0
10
For a deeper dive, our paper "Scaling Context Requires Rethinking Attention" is available on ArXiv:
1
0
19
Trying out Power Retention is as easy as `pip install retention`. We’re also releasing: - PowerCoder 3B: a super-fast, long-context code autocompletion model showcasing our tech - Vidrial: our framework for clean, high-performance CUDA kernels
1
0
9
Manifest's Power Retention enables LLMs to reason efficiently across millions of tokens. Applications that were previously out of reach due to cost of context become feasible. Assistants without session resets, days-long reasoning traces, full software engineering workflows...
1
0
8
Transformers are incredibly powerful, but this power comes at a cost. Transformers memorize everything that they encounter, and at long context, this becomes a bottleneck. Rich understanding *or* long context: transformers can only have one.
1
0
10
Transformers are broken. Today, Manifest AI is releasing Power Retention, an open-source architecture to replace them. More below 🧵:
Today, we’re releasing Power Retention, a new architecture beyond Transformers. It enables LLMs to handle millions of tokens efficiently, unlocking long-context applications that were too costly before. https://t.co/Mdrgz3uBVX
18
58
511
I asked @sama, @tylercowen, @finbarrtimbers, @jacobmbuckman and anonymous: "How might AGI impact entrepreneurs?" @sama At some point there'll be a one-person billion-dollar company. I could see that get started this year and reach that valuation a few years after that.
0
1
7