Ben Walker
@ML_BenWalker
Followers
95
Following
59
Media
34
Statuses
110
ML PhD @OxUniMaths. Researching Neural DEs and the theory of rough paths. Email: [email protected]
University of Oxford
Joined February 2022
1/ Excited to announce that our paper on Structured Linear CDEs (SLiCEs) is a NeurIPS 2025 spotlight! TLDR: Diagonal state-transition matrices (Mamba) are efficient but not expressive. Dense ones are expressive but costly. Structured matrices give efficient maximal expressivity.
2
1
3
Poster Permutation Equivariant Neural Controlled Differential Equations for Dynamic Graph Representation Learning Wed Dec 3, 2025 • 11:00 AM to 2:00 PM PST Exhibit Hall C,D,E #3919 Torben Berndt, Benjamin Walker, Tiexin Qin, Jan Stühmer, Andrey Kormilitzin
0
0
0
Spotlight Poster Structured Linear CDEs: Maximally Expressive and Parallel in Time Sequence Models Thu Dec 4, 2025 • 4:30 PM to 7:30 PM PST Exhibit Hall C,D,E #3909 Benjamin Walker, Lingyi Yang, Nicola Muca Cirone, Cristopher Salvi, Terry Lyons
1
0
0
I am at NeurIPS 2025! I would love to meet people interested in sequence models, neural differential equations, or dynamic graphs. Feel free to reach out if you want to chat, or come find me at one of my posters, details below!
1
0
1
9/ Huge thanks to my coauthors Lingyi Yang, @MucaCirone, Cris Salvi, and Terry Lyons. I greatly enjoyed working on this paper together.
0
0
0
8/ SLiCEs also set a new state of the art among parallel-in-time models on the regular language tasks from the formal language benchmark. For more details, check out the paper and code. Paper: https://t.co/Il9ZUJFT64 Code:
github.com
Code for "Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models" (NeurIPS 2025, Spotlight) - Benjamin-Walker/structured-linear-cdes
1
0
0
7/ This is more than just theory. On permutation composition, LSTM performs well, Mamba struggles, and Dense LNCDEs generalise strongly. Furthermore, SLiCEs match dense performance. They are the only parallel-in-time models that generalise beyond the validation sequence length.
1
0
0
6/ SLiCEs fix this by replacing diagonal matrices with structured ones that still allow mixing. We prove that block-diagonal, sparse, Walsh Hadamard and diagonal plus low rank variants all achieve maximal expressivity while staying parallel-in-time.
1
0
0
5/ Our 2024 NeurIPS paper showed that diagonal state-transition matrices are not maximally expressive, while dense matrices are. The challenge is that dense matrices are expensive.
1
0
0
4/ Mamba uses input-dependent state-transition matrices, keeping parallel-in-time computation while adding expressivity. However, the matrices are diagonal, preventing hidden state mixing. It is like trying to understand an orchestra while hearing each instrument in isolation.
1
0
0
3/ Classical RNNs solve these tasks easily, but their nonlinear recurrences cannot be computed exactly in parallel. Linear RNNs can be parallelised, but they lack the expressive power needed for state-tracking.
1
0
0
2/ Parallel-in-time architectures such as Transformers have enabled sequence models to scale to billions of parameters, but empirically they struggle on state-tracking tasks like modular arithmetic and permutation composition.
1
0
0
One nice part of writing up a thesis is getting to step back and see the combined impact of the last four years of work
0
0
0
Ran a simple check. Standard LSTM vs diagonal state-transition LSTM trained on a regular language (cycle navigation) and evaluated on length generalisation. Removing hidden-state mixing dropped validation accuracy from 100% to ~40%.
Interesting paper on parallelising non-linear RNNs using a parallel Newton solve. However, to make it feasible they used diagonal state-transition matrices, preventing any hidden state mixing. Feels likely this negates the expressivity gains of using non-linearities.
0
0
0
Would have been interesting to see some comparisons with a normal LSTM on state-tracking benchmarks to understand the impact of using diagonal matrices. Paper link:
0
0
0
Interesting paper on parallelising non-linear RNNs using a parallel Newton solve. However, to make it feasible they used diagonal state-transition matrices, preventing any hidden state mixing. Feels likely this negates the expressivity gains of using non-linearities.
1
0
1
Thrilled to share that our follow up paper on permutation equivariant Graph Neural CDEs will be presented at NeurIPS 2025 🎉 Adding permutation equivariance gives strong empirical performance with significantly fewer parameters.
🎉 Excited to share our new TPAMI paper: “Learning Dynamic Graph Embeddings with NCDEs” We introduce Graph NCDEs, a continuous-time model for dynamic graphs. Unlike models that combine a GNN with a time-series model, we directly model evolving graph dynamics. Read the paper 👇
1
0
0