Assaf Ben Kish @abk_tau X Profile

Assaf Ben Kish

@abk_tau

Followers

107

Following

278

Media

20

Statuses

73

Deep Learning | Large Language Models | Reinforcement Learning

Joined August 2023

Don't wanna be here? Send us removal request.

Assaf Ben Kish

@abk_tau

5 days

OPRM is accepted to #COLM2025!.See you in Montreal 🇨🇦. Big thanks to our great collaborators from TAU, MIT, and IBM!.#LLM @COLM_conf.

Assaf Ben Kish

@abk_tau

2 months

New work! 🚨. Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔.And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵. Github:.

1

2

14

Assaf Ben Kish

@abk_tau

1 month

RT @ItamarZimerman: 📄🚨 New!.Tired of waiting minutes for LLMs to "think"?.Test-time scaling (O3, DeepSeek-R1) lets LLMs reason before answe….

0

19

0

Assaf Ben Kish

@abk_tau

1 month

RT @HanGuo97: We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?. I….

0

193

0

Assaf Ben Kish

@abk_tau

1 month

RT @YVinker: Thanks @MIT_CSAIL for featuring our work!🖊️🎨.Huge thanks to the CSAIL news team for the fun article + video!!. We'll be presen….

0

11

0

Assaf Ben Kish

@abk_tau

1 month

RT @MIT_CSAIL: Sometimes the best way to express an idea is by sketching it out. A system from MIT CSAIL & Stanford captures this iterativ….

0

33

0

Assaf Ben Kish

@abk_tau

2 months

RT @gm8xx8: Overflow Prevention Enhances Long-Context Recurrent LLMs. OPRM chunk-based inference:.- Split the context into chunks.- Process….

0

3

0

Assaf Ben Kish

@abk_tau

2 months

Very nice deep dive explaining OPRM by @xiaolGo .

Assaf Ben Kish

@abk_tau

2 months

New work! 🚨. Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔.And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵. Github:.

0

3

Assaf Ben Kish

@abk_tau

2 months

This work was a great collaboration with @ItamarZimerman, @jmie_mirza, James Glass, @leokarlin, and @RGiryes. Check out the paper and our github repo for more experiments, details and code!. Arxiv: Github:

0

2

5

Assaf Ben Kish

@abk_tau

2 months

Lastly, our findings raise questions about whether existing recurrent models genuinely exploit long-range dependencies across multiple chunks, since our single-chunk strategy delivers stronger performance - even in tasks that presumably require cross-segment relations.

1

0

3

Assaf Ben Kish

@abk_tau

2 months

In addition, OPRM naturally acts as a context extension algorithm, allowing the model to handle sequences far longer than those it was originally trained on. OPRM even outperforms existing dedicated context extension methods - sometimes by a wide margin.

1

0

3

Assaf Ben Kish

@abk_tau

2 months

On average, OPRM delivers a 35% improvement on LongBench across various SoTA recurrent architectures, and even sets new SoTA results on the challenging LongBench v2 - all while being faster than vanilla inference and requiring a surprisingly small memory footprint.

1

0

4

Assaf Ben Kish

@abk_tau

2 months

To mitigate this, we complement these architectures with a training-free inference algorithm. Our method, OPRM (Overflow Prevention for Recurrent Models), processes the context in chunks, selects the most relevant one, and decodes it - effectively solving the overflow issue:

1

0

5

Assaf Ben Kish

@abk_tau

2 months

Recent works address this problem by increasing the state size (e.g. Mamba2) or managing the stored information more effectively (e.g. DeltaNet). While these approaches are valuable, they still fall short, as they attempt to compress the entire context into a fixed-size state.

1

0

2

Assaf Ben Kish

@abk_tau

2 months

As shown below by the Associative Recall curve, when the recurrent memory capacity is exceeded, the model exhibits an overflow-like behavior that worsens as more information is added. Therefore, a key principle is to *avoid* overloading the state with information beyond its

1

0

3

Assaf Ben Kish

@abk_tau

2 months

We study long-context performance from a less-explored perspective: recurrent memory capacity, and find that even very large hidden states, such as Falcon-Mamba-7B’s, struggle to retain all contextual information.

1

0

5

Assaf Ben Kish

@abk_tau

2 months

New work! 🚨. Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔.And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵. Github:.

3

25

89

Assaf Ben Kish

@abk_tau

4 months

RT @IdanShenfeld: The next frontier for AI shouldn’t just be generally helpful. It should be helpful for you!. Our new paper shows how to….

0

28

0

Assaf Ben Kish

@abk_tau

5 months

RT @YVinker: SketchAgent has been accepted to #CVPR2025 !.This is an early step toward new tools for visual thinking and richer interaction….

0

13

0

Assaf Ben Kish

@abk_tau

5 months

DeciMamba, the first context extension method for Mamba, is accepted to #ICLR2025! 🎉. New revision with more long-context results:. Special thanks to @ItamarZimerman @ShadyAbh @nadavcohen @amirgloberson @liorwolf @RGiryes !.

Assaf Ben Kish

@abk_tau

1 year

New Work! 🐍.What prevents Mamba from extrapolating to sequences that are significantly longer than those it was trained on? .Furthermore, can Mamba solve long-range NLP tasks using short-range training only?.🧵🧵🧵

1

7

24