Assaf Ben Kish Profile
Assaf Ben Kish

@abk_tau

Followers
105
Following
276
Media
20
Statuses
73

Deep Learning | Large Language Models | Reinforcement Learning

Joined August 2023
Don't wanna be here? Send us removal request.
@abk_tau
Assaf Ben Kish
1 day
OPRM is accepted to #COLM2025!.See you in Montreal 🇨🇦. Big thanks to our great collaborators from TAU, MIT, and IBM!.#LLM @COLM_conf.
@abk_tau
Assaf Ben Kish
2 months
New work! 🚨. Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔.And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵. Github:.
1
2
14
@abk_tau
Assaf Ben Kish
29 days
RT @ItamarZimerman: 📄🚨 New!.Tired of waiting minutes for LLMs to "think"?.Test-time scaling (O3, DeepSeek-R1) lets LLMs reason before answe….
0
19
0
@abk_tau
Assaf Ben Kish
1 month
RT @HanGuo97: We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?. I….
0
192
0
@abk_tau
Assaf Ben Kish
1 month
RT @YVinker: Thanks @MIT_CSAIL for featuring our work!🖊️🎨.Huge thanks to the CSAIL news team for the fun article + video!!. We'll be presen….
0
11
0
@abk_tau
Assaf Ben Kish
1 month
RT @MIT_CSAIL: Sometimes the best way to express an idea is by sketching it out. A system from MIT CSAIL & Stanford captures this iterativ….
0
33
0
@abk_tau
Assaf Ben Kish
2 months
RT @gm8xx8: Overflow Prevention Enhances Long-Context Recurrent LLMs. OPRM chunk-based inference:.- Split the context into chunks.- Process….
0
3
0
@abk_tau
Assaf Ben Kish
2 months
Very nice deep dive explaining OPRM by @xiaolGo .
@abk_tau
Assaf Ben Kish
2 months
New work! 🚨. Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔.And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵. Github:.
0
0
3
@abk_tau
Assaf Ben Kish
2 months
This work was a great collaboration with @ItamarZimerman, @jmie_mirza, James Glass, @leokarlin, and @RGiryes. Check out the paper and our github repo for more experiments, details and code!. Arxiv: Github:
0
2
5
@abk_tau
Assaf Ben Kish
2 months
Lastly, our findings raise questions about whether existing recurrent models genuinely exploit long-range dependencies across multiple chunks, since our single-chunk strategy delivers stronger performance - even in tasks that presumably require cross-segment relations.
1
0
3
@abk_tau
Assaf Ben Kish
2 months
In addition, OPRM naturally acts as a context extension algorithm, allowing the model to handle sequences far longer than those it was originally trained on. OPRM even outperforms existing dedicated context extension methods - sometimes by a wide margin.
Tweet media one
1
0
3
@abk_tau
Assaf Ben Kish
2 months
On average, OPRM delivers a 35% improvement on LongBench across various SoTA recurrent architectures, and even sets new SoTA results on the challenging LongBench v2 - all while being faster than vanilla inference and requiring a surprisingly small memory footprint.
Tweet media one
1
0
4
@abk_tau
Assaf Ben Kish
2 months
To mitigate this, we complement these architectures with a training-free inference algorithm. Our method, OPRM (Overflow Prevention for Recurrent Models), processes the context in chunks, selects the most relevant one, and decodes it - effectively solving the overflow issue:
Tweet media one
1
0
5
@abk_tau
Assaf Ben Kish
2 months
Recent works address this problem by increasing the state size (e.g. Mamba2) or managing the stored information more effectively (e.g. DeltaNet). While these approaches are valuable, they still fall short, as they attempt to compress the entire context into a fixed-size state.
1
0
2
@abk_tau
Assaf Ben Kish
2 months
As shown below by the Associative Recall curve, when the recurrent memory capacity is exceeded, the model exhibits an overflow-like behavior that worsens as more information is added. Therefore, a key principle is to *avoid* overloading the state with information beyond its
Tweet media one
1
0
3
@abk_tau
Assaf Ben Kish
2 months
We study long-context performance from a less-explored perspective: recurrent memory capacity, and find that even very large hidden states, such as Falcon-Mamba-7B’s, struggle to retain all contextual information.
1
0
5
@abk_tau
Assaf Ben Kish
2 months
New work! 🚨. Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔.And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵. Github:.
3
24
89
@abk_tau
Assaf Ben Kish
3 months
RT @IdanShenfeld: The next frontier for AI shouldn’t just be generally helpful. It should be helpful for you!. Our new paper shows how to….
0
28
0
@abk_tau
Assaf Ben Kish
4 months
RT @YVinker: SketchAgent has been accepted to #CVPR2025 !.This is an early step toward new tools for visual thinking and richer interaction….
0
13
0
@abk_tau
Assaf Ben Kish
5 months
DeciMamba, the first context extension method for Mamba, is accepted to #ICLR2025! 🎉. New revision with more long-context results:. Special thanks to @ItamarZimerman @ShadyAbh @nadavcohen @amirgloberson @liorwolf @RGiryes !.
@abk_tau
Assaf Ben Kish
1 year
New Work! 🐍.What prevents Mamba from extrapolating to sequences that are significantly longer than those it was trained on? .Furthermore, can Mamba solve long-range NLP tasks using short-range training only?.🧵🧵🧵
Tweet media one
1
7
24