Mathis Pink Profile
Mathis Pink

@MathisPink

Followers
371
Following
866
Media
3
Statuses
87

👀🧠(x) | x ∈ {👀🧠,🤖} PhD student @mpi_sws_ trying to trick rocks into thinking and remembering.

Saarbrücken, Germany
Joined August 2020
Don't wanna be here? Send us removal request.
@MathisPink
Mathis Pink
1 year
1/n🤖🧠 New paper alert!📢 In "Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks" ( https://t.co/S8BZzkFVM6) we introduce SORT as the first method to evaluate episodic memory in large language models. Read on to find out what we discovered!🧵
Tweet card summary image
arxiv.org
Current LLM benchmarks focus on evaluating models' memory of facts and semantic relations, primarily assessing semantic aspects of long-term memory. However, in humans, long-term memory also...
4
17
61
@lasha_nlp
Abhilasha Ravichander
1 month
📢 I'm recruiting PhD students at MPI!! Topics include: 1⃣ LLM factuality, reliable info synthesis and reasoning, personalization + applications in real-world inc. education, science 2⃣ Data-centric interpretability 3⃣Creativity in AI, esp scientific applications 🧵1/2
9
109
457
@TimKietzmann
Tim Kietzmann
5 months
Exciting new preprint from the lab: “Adopting a human developmental visual diet yields robust, shape-based AI vision”. A most wonderful case where brain inspiration massively improved AI solutions. Work with @lu_zejin @martisamuser and Radoslaw Cichy https://t.co/XVYqQPjoTA
Tweet card summary image
arxiv.org
Despite years of research and the dramatic scaling of artificial intelligence (AI) systems, a striking misalignment between artificial and human vision persists. Contrary to humans, AI heavily...
5
52
145
@ohmoussa2
Omer Moussa
6 months
🚨Excited to share our latest work published at Interspeech 2025: “Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain”! 🧠🎧 https://t.co/LeCs6YfbZp W/ @mtoneva1 We fine-tuned speech models directly with brain fMRI data, making them more brain-like.🧵
Tweet card summary image
arxiv.org
Pretrained self-supervised speech models excel in speech tasks but do not reflect the hierarchy of human speech processing, as they encode rich semantics in middle layers and poor semantics in...
1
5
30
@martinagvilas
Martina Vilas
8 months
We will be presenting this 💫 spotlight 💫 paper at #ICLR2025. Come say hi or DM me if you're interested in discussing AI #interpretability in Singapore! 📆 Poster Session 4 (#530) 🕰️ Fri 25 Apr. 3:00-5:30 PM 📝 https://t.co/xSyTt7bqxy 📊 https://t.co/wftoGjLZrd
6
28
208
@s_michelmann
Sebastian Michelmann
3 years
Excited to share our new preprint https://t.co/c8kgodk5hS with @mtoneva1, @ptoncompmemlab, and @manojneuro), in which we ask if GPT-3 (a large language model) can segment narratives into meaningful events similarly to humans. We use an unconventional approach: ⬇️
Tweet card summary image
arxiv.org
Humans perceive discrete events such as "restaurant visits" and "train rides" in their continuous experience. One important prerequisite for studying human event perception is the ability of...
2
30
94
@krandiash
Karan Goel
1 year
A few interesting challenges in extending context windows. A model with a big prompt =/= "infinite context" in my mind. 10M tokens of context is not exactly on the path to infinite context. Instead, it requires a streaming model that has - an efficient state with fast
@kimmonismus
Chubby♨️
1 year
Sam Altman: 10m context window in months, infinite context within several years
2
10
83
@fchollet
François Chollet
1 year
Anyway, glad to see that the whole "let's just pretrain a bigger LLM" paradigm is dead. Model size is stagnating or even decreasing, while researchers are now looking at the right problems -- either test-time training or neurosymbolic approaches like test-time search, program
9
34
510
@MathisPink
Mathis Pink
1 year
https://t.co/rahFgZKfH7 We think this is because LLMs do not have parametric episodic memory (as opposed to semantic memory)! We recently created SORT, a new benchmark task that tests temporal order memory in LLMs
@goodside
Riley Goodside
1 year
An LLM knows every work of Shakespeare but can’t say which it read first. In this material sense a model hasn’t read at all. To read is to think. Only at inference is there space for serendipitous inspiration, which is why LLMs have so little of it to show for all they’ve seen.
0
0
4
@francoisfleuret
François Fleuret
1 year
Consider the prompt X="Describe a beautiful house." We can consider two processes to generate the answer Y: (A) sample P(Y | X) or, (B) sample an image Z with a conditional image density model P(Z | X) and then sample P(Y | Z). 1/3
5
3
63
@MathisPink
Mathis Pink
1 year
4/n💡We find that fine-tuning or RAG do not support episodic memory capabilities well (yet). In-context presentation supports some episodic memory capabilities but at high costs and insufficient length-generalization, making it a bad candidate for episodic memory!
1
1
4
@ohmoussa2
Omer Moussa
1 year
We are so excited to share the first work that demonstrates consistent downstream improvements for language tasks after fine-tuning with brain data!! Improving semantic understanding in speech language models via brain-tuning https://t.co/HVAzk36Wga W/ @dklakow, @mtoneva1
3
8
62
@MathisPink
Mathis Pink
1 year
@vvobot @moon91007207 @javiturek @s_michelmann @alex_ander @mtoneva1 12/n Check out our full paper, SORT evaluation code to test your own models, and Book-SORT dataset on 🤗-datasets: Paper: https://t.co/S8BZzkFVM6 Code: https://t.co/RkfQFhlO0Z Book-SORT:
Tweet card summary image
huggingface.co
0
0
1
@MathisPink
Mathis Pink
1 year
11/n SORT is the product of joint work together with @vvobot, @moon91007207, Jianing Mu, @javiturek, Uri Hasson, Ken Norman, @s_michelmann, @alex_ander and @mtoneva1
1
0
0
@MathisPink
Mathis Pink
1 year
10/n We are excited to see how SORT will be used in the ongoing development of emerging memory architectures that provide alternatives to limited in-context memory🚀
1
0
0
@MathisPink
Mathis Pink
1 year
9/n🎁SORT provides a first method to evaluate episodic memory in models! The task can easily be adapted to test episodic memory capabilities in different modalities (audio, video), and it does not need any annotations!
1
0
0
@MathisPink
Mathis Pink
1 year
8/n🧠As a reference point, we collected SORT data from humans who had recently read one of the books in Book-SORT, showing that humans can recall the order of segments based on long-term memory, especially when the distance between segments is larger.
1
0
0
@MathisPink
Mathis Pink
1 year
7/n🗂️ RAG (Recall-Augmented Generation) based models perform poorly when the correctly retrieved passages from a book are not presented in the correct order. This is because text-chunks in RAG-based memory are fundamentally decontextualized.
1
0
2
@MathisPink
Mathis Pink
1 year
6/n🔍 Fine-tuning LLMs on book texts does not improve their ability to perform SORT when not given source excerpts in-context. Current forms of parametric memory in models may not be the right path towards episodic memory capabilities in LLMs. What about RAG as an alternative?
1
0
0
@MathisPink
Mathis Pink
1 year
5/n⭐With high accuracy rates up to 95%, LLMs perform surprisingly well on SORT when given in-context access to relevant excerpts of the books! BUT: As the context length increases, LLMs' performance on SORT decreases significantly 📉
1
0
0
@MathisPink
Mathis Pink
1 year
4/n💡We find that fine-tuning or RAG do not support episodic memory capabilities well (yet). In-context presentation supports some episodic memory capabilities but at high costs and insufficient length-generalization, making it a bad candidate for episodic memory!
1
1
4