Federico Danieli @FedericoDa40495 X Profile

Federico Danieli

@FedericoDa40495

Followers

7

Following

0

Media

0

Statuses

10

Joined October 2025

Don't wanna be here? Send us removal request.

Federico Danieli

@FedericoDa40495

21 days

@prlz77 📋And if you’re a PhD student interested in working on these topics, we got a fresh internship position just for you:

0

Federico Danieli

@FedericoDa40495

21 days

[8/8] 𝗧𝗶𝗺𝗲 𝘁𝗼 𝗲𝘅𝗽𝗹𝗼𝗿𝗲 𝘄𝗵𝗮𝘁 𝘁𝗿𝘂𝗹𝘆 𝗻𝗼𝗻𝗹𝗶𝗻𝗲𝗮𝗿 𝗥𝗡𝗡𝘀 𝗰𝗮𝗻 𝗱𝗼 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲 Paper: https://t.co/lFQrUEgclx Code: https://t.co/Lg7gbcwOvs Collaborators: @prlz77 , Miguel Sarabia, Xavier Suau, Luca Zappella

1

0

Federico Danieli

@FedericoDa40495

21 days

💻[7/8] For this, we’re releasing open-source code to automatically parallelise RNNs. No need to implement your own parallel scan, nor to remember how Newton works: just prescribe the recurrence, flag eventual structures in the state update, and watch GPUs go 𝘣𝘳𝘳𝘳𝘳𝘳𝘳𝘳

1

0

Federico Danieli

@FedericoDa40495

21 days

🔥[6/8] Why this matters: Mamba challenged the Transformer’s monopoly. ParaRNN expands the search space of available architectures. It’s time to get back to the drawing board and use these tools to start designing the next generation of inference-efficient models

1

0

Federico Danieli

@FedericoDa40495

21 days

📈[5/8] So we took LSTM and GRU architectures—remember those from the pre-Transformer era?—scaled them to 7B parameters, and achieved perplexity comparable to similarly-sized Transformers. No architectural tricks. Just pure scale, finally unlocked

1

0

Federico Danieli

@FedericoDa40495

21 days

⚡️[4/8] The result? Up to 665x speedup over naive sequential processing, and training times comparable to Mamba—even with the extra overhead from Newton’s iterations!

1

0

Federico Danieli

@FedericoDa40495

21 days

💡 [3/8] Our approach: Recast the sequence of nonlinear recurrences as a system of equations, then solve them in parallel using Newton’s method. As a bonus, make everything blazingly fast with custom CUDA kernels

1

0

Federico Danieli

@FedericoDa40495

21 days

🐍 [2/8] But wait, doesn’t Mamba parallelise this too? Sure, but here’s the catch: Mamba requires state space updates to be linear, fundamentally affecting expressivity. We want the freedom to apply nonlinearities sequence-wise

1

0

Federico Danieli

@FedericoDa40495

21 days

👉 [1/8-TL;DR] We can now train nonlinear RNNs at unprecedented scales, by parallelising what was previously thought inherently sequential—the unrolling of recurrent computations. If you need fast inference, or are into time-series, we got good news: RNNs are back on the menu

1

0

Federico Danieli

@FedericoDa40495

21 days

𝗣𝗮𝗿𝗮𝗥𝗡𝗡: 𝗨𝗻𝗹𝗼𝗰𝗸𝗶𝗻𝗴 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗼𝗳 𝗡𝗼𝗻𝗹𝗶𝗻𝗲𝗮𝗿 𝗥𝗡𝗡𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀 For years, we’ve given RNNs for doomed, and looked at Transformer as 𝘁𝗵𝗲 LLM—but we just needed better math 📄 https://t.co/lFQrUEfEvZ 💻 https://t.co/Lg7gbcwgFU

github.com

Contribute to apple/ml-pararnn development by creating an account on GitHub.

1

2

7