Federico Danieli
@FedericoDa40495
Followers
7
Following
0
Media
0
Statuses
10
Joined October 2025
@prlz77 ๐And if youโre a PhD student interested in working on these topics, we got a fresh internship position just for you:
0
0
0
[8/8] ๐ง๐ถ๐บ๐ฒ ๐๐ผ ๐ฒ๐
๐ฝ๐น๐ผ๐ฟ๐ฒ ๐๐ต๐ฎ๐ ๐๐ฟ๐๐น๐ ๐ป๐ผ๐ป๐น๐ถ๐ป๐ฒ๐ฎ๐ฟ ๐ฅ๐ก๐ก๐ ๐ฐ๐ฎ๐ป ๐ฑ๐ผ ๐ฎ๐ ๐๐ฐ๐ฎ๐น๐ฒ Paper: https://t.co/lFQrUEgclx Code: https://t.co/Lg7gbcwOvs Collaborators: @prlz77 , Miguel Sarabia, Xavier Suau, Luca Zappella
1
0
0
๐ป[7/8] For this, weโre releasing open-source code to automatically parallelise RNNs. No need to implement your own parallel scan, nor to remember how Newton works: just prescribe the recurrence, flag eventual structures in the state update, and watch GPUs go ๐ฃ๐ณ๐ณ๐ณ๐ณ๐ณ๐ณ๐ณ
1
0
0
๐ฅ[6/8] Why this matters: Mamba challenged the Transformerโs monopoly. ParaRNN expands the search space of available architectures. Itโs time to get back to the drawing board and use these tools to start designing the next generation of inference-efficient models
1
0
0
๐[5/8] So we took LSTM and GRU architecturesโremember those from the pre-Transformer era?โscaled them to 7B parameters, and achieved perplexity comparable to similarly-sized Transformers. No architectural tricks. Just pure scale, finally unlocked
1
0
0
โก๏ธ[4/8] The result? Up to 665x speedup over naive sequential processing, and training times comparable to Mambaโeven with the extra overhead from Newtonโs iterations!
1
0
0
๐ก [3/8] Our approach: Recast the sequence of nonlinear recurrences as a system of equations, then solve them in parallel using Newtonโs method. As a bonus, make everything blazingly fast with custom CUDA kernels
1
0
0
๐ [2/8] But wait, doesnโt Mamba parallelise this too? Sure, but hereโs the catch: Mamba requires state space updates to be linear, fundamentally affecting expressivity. We want the freedom to apply nonlinearities sequence-wise
1
0
0
๐ [1/8-TL;DR] We can now train nonlinear RNNs at unprecedented scales, by parallelising what was previously thought inherently sequentialโthe unrolling of recurrent computations. If you need fast inference, or are into time-series, we got good news: RNNs are back on the menu
1
0
0
๐ฃ๐ฎ๐ฟ๐ฎ๐ฅ๐ก๐ก: ๐จ๐ป๐น๐ผ๐ฐ๐ธ๐ถ๐ป๐ด ๐ฃ๐ฎ๐ฟ๐ฎ๐น๐น๐ฒ๐น ๐ง๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐ผ๐ณ ๐ก๐ผ๐ป๐น๐ถ๐ป๐ฒ๐ฎ๐ฟ ๐ฅ๐ก๐ก๐ ๐ณ๐ผ๐ฟ ๐๐๐ ๐ For years, weโve given RNNs for doomed, and looked at Transformer as ๐๐ต๐ฒ LLMโbut we just needed better math ๐ https://t.co/lFQrUEfEvZ ๐ป https://t.co/Lg7gbcwgFU
github.com
Contribute to apple/ml-pararnn development by creating an account on GitHub.
1
2
7