Jacob Buckman @jacobmbuckman X Profile

Jacob Buckman

@jacobmbuckman

Followers

6K

Following

1K

Media

70

Statuses

2K

Founder @manifest__ai. PhD candidate @MILAMontreal. Formerly @jhuclsp, @GoogleAI, @SCSatCMU.

https://t.co/QO8FGiV1cn

Manhattan, NY

Joined December 2016

Don't wanna be here? Send us removal request.

Carles Gelada

@carlesgelada

16 days

We have trained Brumby-14B, the best RNN in history based on power retention. It's 100% attention free (because you don't need a hybrid if your RNN layer actually works). A lot of new LLM applications are about to be unlocked with cheap long context finetunes of Brumby!

3

1

13

Sean Zhang

@seeeeaaaannnnnn

17 days

We built the strongest attention-free base LLM, Brumby-14B-Base, for only $4,000. This is just a base model, but for the first time in 2 years, I'm seeing light at the end of the tunnel for the field to switch to an alternative architecture to transformers.

2

4

26

Jacob Buckman

@jacobmbuckman

16 days

https://t.co/BOMBk4AEYv If I were @jefrankle, I'd be getting a bit nervous right about now :-)

Sasha Rush

@srush_nlp

4 years

Wager established. Jonathan Frankle (@jefrankle) stepped up to my Transformer long bet. https://t.co/NTy6avtZlc I'm counting on you. You only have 1700 days!

1

23

Jacob Buckman

@jacobmbuckman

16 days

The key innovation enabling this result was our power attention layer, which we open-sourced a few weeks ago. Its similarity to attention enables cost-efficient reuse of weights, but the resulting arch has no quadratic-cost bottleneck. More info here:

manifestai.com

Releasing open-source code for Power Retention and accompanying research paper.

1

2

48

Jacob Buckman

@jacobmbuckman

16 days

By retraining our architecture around the weights of existing (attention-based) models, we get an enormous boost to learning speed. We can catch up to the performance of frontier models at this scale in just a few thousand steps of training.

2

1

41

Jacob Buckman

@jacobmbuckman

16 days

The end of the transformer era marches slowly closer: we trained a completely attention-free foundation model at the 14B scale for only $4,000. The performance matches other models of similar scale, including transformers and hybrid models.

Manifest AI

@manifest__ai

16 days

Today we are releasing Brumby-14B-Base, the strongest attention-free base model around. https://t.co/mclQPFdOGa

44

93

995

Jacob Buckman

@jacobmbuckman

1 month

Excited to share my recent interview on Recurrence and Attention for Long-Context Transformers with @samcharrington for the @twimlai podcast. Check it out! https://t.co/ezdkB1N9RM via @twimlai

twimlai.com

1

2

12

Jacob Buckman

@jacobmbuckman

2 months

Really fun conversation on @latentspacepod!

Alessio Fanelli

@FanaHOVA

2 months

We had @jacobmbuckman on @latentspacepod to talk about Power Retention, their new CUDA kernel library Vidrial, and how they retrained StarCoder to use it!

1

0

4

jan

@ironcarbs

2 months

I knew what Carles and Jacob were working on for about ~1 year now. It is still extremely exciting to see these two graphs side by side📉📈

Manifest AI

@manifest__ai

2 months

Today, we’re releasing Power Retention, a new architecture beyond Transformers. It enables LLMs to handle millions of tokens efficiently, unlocking long-context applications that were too costly before. https://t.co/Mdrgz3uBVX

0

2

8

Jacob Buckman

@jacobmbuckman

2 months

Thank you @RashiShrivast18 for highlighting our work @manifest__ai!

Rashi Shrivastava

@RashiShrivast18

2 months

The “dirty secret” in the field of AI is that it's humans that are making choices about what data AI models ingest to answer questions, not machines, Manifest AI cofounders said. The research lab claims its new model architecture can help fix that. https://t.co/QIYCxL9X8U

0

4

Branton DeMoss

@BrantonDeMoss

2 months

This is probably the most important architectural change to neural nets since the Transformer. It's time to train long-context models for everything :) Congrats @jacobmbuckman and everyone @manifest__ai!

Manifest AI

@manifest__ai

2 months

Today, we’re releasing Power Retention, a new architecture beyond Transformers. It enables LLMs to handle millions of tokens efficiently, unlocking long-context applications that were too costly before. https://t.co/Mdrgz3uBVX

0

3

7

Paul Klein IV

@pk_iv

2 months

@jacobmbuckman MORE POWER. MORE RETENTION.

0

2

9

Jacob Buckman

@jacobmbuckman

2 months

Discover more details, access the code, benchmarks, and read the full paper here:

manifestai.com

Releasing open-source code for Power Retention and accompanying research paper.

0

13

Jacob Buckman

@jacobmbuckman

2 months

Power Retention represents a step change for the field. We've done nothing but exploit scale for the past five years; it's time to rethink the fundamentals. No more hacks, patches, or RAG. The long context era is about to begin. Are you ready?

1

0

10

Jacob Buckman

@jacobmbuckman

2 months

For a deeper dive, our paper "Scaling Context Requires Rethinking Attention" is available on ArXiv:

1

0

19

Jacob Buckman

@jacobmbuckman

2 months

Trying out Power Retention is as easy as `pip install retention`. We’re also releasing: - PowerCoder 3B: a super-fast, long-context code autocompletion model showcasing our tech - Vidrial: our framework for clean, high-performance CUDA kernels

1

0

9

Jacob Buckman

@jacobmbuckman

2 months

Manifest's Power Retention enables LLMs to reason efficiently across millions of tokens. Applications that were previously out of reach due to cost of context become feasible. Assistants without session resets, days-long reasoning traces, full software engineering workflows...

1

0

8

Jacob Buckman

@jacobmbuckman

2 months

Transformers are incredibly powerful, but this power comes at a cost. Transformers memorize everything that they encounter, and at long context, this becomes a bottleneck. Rich understanding *or* long context: transformers can only have one.

1

0

10

Jacob Buckman

@jacobmbuckman

2 months

Transformers are broken. Today, Manifest AI is releasing Power Retention, an open-source architecture to replace them. More below 🧵:

Manifest AI

@manifest__ai

2 months

Today, we’re releasing Power Retention, a new architecture beyond Transformers. It enables LLMs to handle millions of tokens efficiently, unlocking long-context applications that were too costly before. https://t.co/Mdrgz3uBVX

18

58

511

Chris Barber

@chrisbarber

2 months

I asked @sama, @tylercowen, @finbarrtimbers, @jacobmbuckman and anonymous: "How might AGI impact entrepreneurs?" @sama At some point there'll be a one-person billion-dollar company. I could see that get started this year and reach that valuation a few years after that.

0

1

7