Federico Danieli Profile
Federico Danieli

@FedericoDa40495

Followers
7
Following
0
Media
0
Statuses
10

Joined October 2025
Don't wanna be here? Send us removal request.
@FedericoDa40495
Federico Danieli
21 days
@prlz77 ๐Ÿ“‹And if youโ€™re a PhD student interested in working on these topics, we got a fresh internship position just for you:
0
0
0
@FedericoDa40495
Federico Danieli
21 days
[8/8] ๐—ง๐—ถ๐—บ๐—ฒ ๐˜๐—ผ ๐—ฒ๐˜…๐—ฝ๐—น๐—ผ๐—ฟ๐—ฒ ๐˜„๐—ต๐—ฎ๐˜ ๐˜๐—ฟ๐˜‚๐—น๐˜† ๐—ป๐—ผ๐—ป๐—น๐—ถ๐—ป๐—ฒ๐—ฎ๐—ฟ ๐—ฅ๐—ก๐—ก๐˜€ ๐—ฐ๐—ฎ๐—ป ๐—ฑ๐—ผ ๐—ฎ๐˜ ๐˜€๐—ฐ๐—ฎ๐—น๐—ฒ Paper: https://t.co/lFQrUEgclx Code: https://t.co/Lg7gbcwOvs Collaborators: @prlz77 , Miguel Sarabia, Xavier Suau, Luca Zappella
1
0
0
@FedericoDa40495
Federico Danieli
21 days
๐Ÿ’ป[7/8] For this, weโ€™re releasing open-source code to automatically parallelise RNNs. No need to implement your own parallel scan, nor to remember how Newton works: just prescribe the recurrence, flag eventual structures in the state update, and watch GPUs go ๐˜ฃ๐˜ณ๐˜ณ๐˜ณ๐˜ณ๐˜ณ๐˜ณ๐˜ณ
1
0
0
@FedericoDa40495
Federico Danieli
21 days
๐Ÿ”ฅ[6/8] Why this matters: Mamba challenged the Transformerโ€™s monopoly. ParaRNN expands the search space of available architectures. Itโ€™s time to get back to the drawing board and use these tools to start designing the next generation of inference-efficient models
1
0
0
@FedericoDa40495
Federico Danieli
21 days
๐Ÿ“ˆ[5/8] So we took LSTM and GRU architecturesโ€”remember those from the pre-Transformer era?โ€”scaled them to 7B parameters, and achieved perplexity comparable to similarly-sized Transformers. No architectural tricks. Just pure scale, finally unlocked
1
0
0
@FedericoDa40495
Federico Danieli
21 days
โšก๏ธ[4/8] The result? Up to 665x speedup over naive sequential processing, and training times comparable to Mambaโ€”even with the extra overhead from Newtonโ€™s iterations!
1
0
0
@FedericoDa40495
Federico Danieli
21 days
๐Ÿ’ก [3/8] Our approach: Recast the sequence of nonlinear recurrences as a system of equations, then solve them in parallel using Newtonโ€™s method. As a bonus, make everything blazingly fast with custom CUDA kernels
1
0
0
@FedericoDa40495
Federico Danieli
21 days
๐Ÿ [2/8] But wait, doesnโ€™t Mamba parallelise this too? Sure, but hereโ€™s the catch: Mamba requires state space updates to be linear, fundamentally affecting expressivity. We want the freedom to apply nonlinearities sequence-wise
1
0
0
@FedericoDa40495
Federico Danieli
21 days
๐Ÿ‘‰ [1/8-TL;DR] We can now train nonlinear RNNs at unprecedented scales, by parallelising what was previously thought inherently sequentialโ€”the unrolling of recurrent computations. If you need fast inference, or are into time-series, we got good news: RNNs are back on the menu
1
0
0
@FedericoDa40495
Federico Danieli
21 days
๐—ฃ๐—ฎ๐—ฟ๐—ฎ๐—ฅ๐—ก๐—ก: ๐—จ๐—ป๐—น๐—ผ๐—ฐ๐—ธ๐—ถ๐—ป๐—ด ๐—ฃ๐—ฎ๐—ฟ๐—ฎ๐—น๐—น๐—ฒ๐—น ๐—ง๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐—ผ๐—ณ ๐—ก๐—ผ๐—ป๐—น๐—ถ๐—ป๐—ฒ๐—ฎ๐—ฟ ๐—ฅ๐—ก๐—ก๐˜€ ๐—ณ๐—ผ๐—ฟ ๐—Ÿ๐—Ÿ๐— ๐˜€ For years, weโ€™ve given RNNs for doomed, and looked at Transformer as ๐˜๐—ต๐—ฒ LLMโ€”but we just needed better math ๐Ÿ“„ https://t.co/lFQrUEfEvZ ๐Ÿ’ป https://t.co/Lg7gbcwgFU
Tweet card summary image
github.com
Contribute to apple/ml-pararnn development by creating an account on GitHub.
1
2
7