Ali Behrouz @behrouz_ali X Profile

Ali Behrouz

@behrouz_ali

Followers

4K

Following

898

Media

39

Statuses

153

Research Intern @Google, Ph.D. Student @Cornell_CS, interested in machine learning and understanding intelligence.

Joined January 2023

Don't wanna be here? Send us removal request.

Ali Behrouz

@behrouz_ali

7 months

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? . Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans

79

603

3K

Ali Behrouz

@behrouz_ali

4 days

RT @gmongaras: Threw a paper I've been working on onto ArXiv. Trying to get a little closer to understanding why softmax in attention works….

arxiv.org

Since its introduction, softmax attention has become the backbone of modern transformer architectures due to its expressiveness and scalability across a wide range of tasks. However, the main...

0

4

0

Ali Behrouz

@behrouz_ali

7 days

RT @behrouz_ali: What makes attention the critical component for most advances in LLMs and what holds back long-term memory modules (RNNs)?….

0

140

0

Ali Behrouz

@behrouz_ali

8 days

Everyone is talking about reviewers who don't engage or provide low-quality reviews. While harmful, I don't see that as the biggest threat to the peer review system. As both an author and reviewer, I'm seeing zero-sum debates where a reviewer puts their full effort into rejecting.

Ahmad Beirami

@abeirami

9 days

Instead of complaining that peer review is dead, take a positive step to improve it today. The reviewers are not aliens, they are us! . - Revise your review and make it clear. Identify the crucial points that impacted your score negatively and positively. - If the paper is.

1

0

11

Ali Behrouz

@behrouz_ali

15 days

RT @mirrokni: Proud to announce an official Gold Medal at #IMO2025🥇. The IMO committee has certified the result from our general-purpose Ge….

deepmind.google

Our advanced model officially achieved a gold-medal level performance on problems from the International Mathematical Olympiad (IMO), the world’s most prestigious competition for young...

0

35

0

Ali Behrouz

@behrouz_ali

21 days

RT @reza_byt: 📄 New Paper Alert! ✨. 🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput. Across 135 M–1.7 B….

0

55

0

Ali Behrouz

@behrouz_ali

1 month

RT @yingheng_wang: ❓ Are LLMs actually problem solvers or just good at regurgitating facts?. 🚨New Benchmark Alert! We built HeuriGym to ben….

0

25

0

Ali Behrouz

@behrouz_ali

2 months

RT @tdietterich: The scope of what counts as research has narrowed considerably.

0

10

0

Ali Behrouz

@behrouz_ali

2 months

RT @leloykun: Fast, Numerically Stable, and Auto-Differentiable Spectral Clipping via Newton-Schulz Iteration. Hi all, I'm bacc. I have a l….

0

41

0

Ali Behrouz

@behrouz_ali

2 months

Very interesting work!.

Infini-AI-Lab

@InfiniAILab

2 months

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46%. 🌐 Website: 🧵 1/n

0

7

Ali Behrouz

@behrouz_ali

2 months

RT @pmddomingos: The ratio of science to engineering in AI is approaching zero.

0

25

0

Ali Behrouz

@behrouz_ali

2 months

RT @TheTuringPost: Last week, @Google dropped a paper on ATLAS, a new architecture that reimagines how models learn and use memory. Unfort….

0

77

0

Ali Behrouz

@behrouz_ali

2 months

@mirrokni @meisamrr Here is the link to the paper: This is a work with Zeman Li, Praneeth Kacham, @daliri__majid, @yuandeng_cs, Peilin Zhong, @meisamrr, and @mirrokni.

2

1

41

Ali Behrouz

@behrouz_ali

2 months

@mirrokni @meisamrr In our experiments, we focus on language modeling, common-sense reasoning, needle in a haystack, in-context recall, and multi-query associative recall tasks. Atlas is very effective in all scales (tested up to 1.3B), even outperforming Titans, and other linear RNNs in long

1

0

36

Ali Behrouz

@behrouz_ali

2 months

@mirrokni @meisamrr How to incorporate the memory?. We follow Titans and use the variants of Memory as Context (MAC), as Gate (MAG), and as Layer (MAL) but without persistent memory tokens in our experiments.

2

1

33

Ali Behrouz

@behrouz_ali

2 months

@mirrokni @meisamrr Can we use what we have learned from the design of long-term neural memory (Atlas) and further enhance attention and Transformers?. We aim to strictly generalize original softmax Transformers from two important aspects: (1) Deep Memory: Transformers use matrix-value memories,.

1

32

Ali Behrouz

@behrouz_ali

2 months

@mirrokni @meisamrr Even with a powerful surprise metric and enhanced memory capacity, the memory needs to properly be updated and optimized. In fact, a bad update rule can cause the memory to be stuck in local optima and so does not properly memorize the context. While almost all models are based

1

2

39

Ali Behrouz

@behrouz_ali

2 months

@mirrokni @meisamrr Now that we addressed the first drawback, how do we enhance the memory capacity (i.e., the number of data samples that the memory can store in its parameters)?. Attention acts as an unbounded associative memory that tries to learn the mapping between a set of queries and a set of

1

34

Ali Behrouz

@behrouz_ali

2 months

@mirrokni @meisamrr How does memory prune the context? We provide additional flexibility for the model by gamma parameters to prune the context, whenever it is needed. This is similar to forgetting but with more direct access to the local tokens. That is, the model can simply ignore past tokens by

1

34

Ali Behrouz

@behrouz_ali

2 months

@mirrokni @meisamrr From the memory perspective: Our brain prioritizes events that violate the expectations (being surprising). While an event itself consists of different elements, the judgment of prioritization depends on them all. In Titans, however, the model computes the surprise metric for

3

46

Ali Behrouz

@behrouz_ali

2 months

@mirrokni @meisamrr Coming back to the first question: What makes attention the critical component for most advances in LLMs and what holds back long-term memory modules (RNNs)?. We observe three disjoint aspects that limit the performance of long-term memory modules in long context tasks: (1)

1

0

42