Michael Poli Profile Banner
Michael Poli Profile
Michael Poli

@MichaelPoli6

Followers
1,817
Following
292
Media
50
Statuses
313

AI, numerics and systems.

Joined August 2018
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@MichaelPoli6
Michael Poli
21 days
Excited to announce RTF is the first release of a new research group at @LiquidAI_ , led by @Massastrello and I. We will keep focusing on the foundations of architecture design, scaling, and systems for efficient training and inference, building on our work on deep signal
Tweet media one
@romu_nishi
Rom Parnichkun
21 days
[1/6] Releasing the Rational Transfer Function (RTF) parametrization for linear time invariant (LTI) and weakly input-varying (wLIV) sequence models. New SOTA for SSMs on LRA, and improved perplexity on language modeling with Hyena.
Tweet media one
2
11
43
0
12
34
@MichaelPoli6
Michael Poli
1 year
Attention is great. Are there other operators that scale? Excited to share our work on Hyena, an alternative to attn that can learn on sequences *10x longer*, up to *100x faster* than optimized attn, by using implicit long convolutions & gating 📜 1/
Tweet media one
10
164
813
@MichaelPoli6
Michael Poli
2 months
📢New research on mechanistic architecture design and scaling laws. - We perform the largest scaling laws analysis (500+ models, up to 7B) of beyond Transformer architectures to date - For the first time, we show that architecture performance on a set of isolated token
Tweet media one
13
89
455
@MichaelPoli6
Michael Poli
4 years
[1/4] Excited to share the first experimental release of *torchdyn* , a PyTorch library for all things neural differential equations! torchdyn is developed by the core DiffEqML team. @Massastrello @Diffeq_ml
Tweet media one
2
64
233
@MichaelPoli6
Michael Poli
6 months
We've been hard at work pushing the frontiers of efficient architecture design and optimization. StripedHyena-7B is the result: the first alternative architecture truly competitive with the best Transformers of its size or larger. And it's very fast.
Tweet media one
@togethercompute
Together AI
6 months
Announcing StripedHyena 7B — an open source model using an architecture that goes beyond Transformers achieving faster performance and longer context. It builds on the lessons learned in past year designing efficient sequence modeling architectures.
Tweet media one
31
266
1K
5
18
142
@MichaelPoli6
Michael Poli
2 years
Let us embark on a fractal journey about dynamical systems and neural implicit representations... 1/
1
19
123
@MichaelPoli6
Michael Poli
1 year
Hungry for more content on efficient long context models after @srush_nlp 's awesome keynote? We put together some of our perspectives in a short note:
@srush_nlp
Sasha Rush
1 year
Do we need Attention? (v0 ): Slides for a survey talk summarizing recent Linear RNN models with a focus on NLP. Tries to cover a lot of different S4-related models (as well as RWKV/MEGA) in a digestible way.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
13
176
828
2
31
110
@MichaelPoli6
Michael Poli
2 years
Join us Dec 14th (EST time) for the NeurIPS workshop "The Symbiosis of Deep Learning and Differential Equations": This is also your chance to submit questions to our great lineup of panelists, via:
Tweet media one
1
15
82
@MichaelPoli6
Michael Poli
4 years
[1/n] The community has been hard at work to speed up Neural ODEs, e.g. regularization strategies @DavidDuvenaud @chuckberryfinn to keep the ODE easy to solve. We've also been thinking about the same problem, and we propose a different (compatible!) direction. @Massastrello
2
11
72
@MichaelPoli6
Michael Poli
3 months
In case you missed it: a new 7B StripedHyena model is out (the longest context one, yet), Evo-1 7B 🧬. And it now runs in a single notebook (powered by @togethercompute ), from DNA generation to protein fold.
Tweet media one
1
10
69
@MichaelPoli6
Michael Poli
6 months
I'm going to be at NeurIPS to present work on efficient model architecture and inference (with @exnx @Massastrello and others) HyenaDNA: Laughing Hyena: Excited to catch up with old friends and make some new ones - DM if you'd
2
6
69
@MichaelPoli6
Michael Poli
3 years
The website for our 'The Symbiosis of Deep Learning and Differential Equations' #NeurIPS2021 workshop is up: We have a special track for already published papers. Share your work from adjacent fields with the NeurIPS community! Deadline: Sept. 17 AoE
Tweet media one
0
13
64
@MichaelPoli6
Michael Poli
2 years
[1/2] Another year, another NeurIPS! "Neural Hybrid Automata: Learning Dynamics With Multiple Modes and Stochastic Transitions" accepted at #NeurIPS2021 Come by to chat about NHA, a method to learn stochastic hybrid systems! Poster: Dec 07 08:30 AM -- 10:00 AM (PST).
Tweet media one
1
8
62
@MichaelPoli6
Michael Poli
3 years
A primer on effortless Neural ODE models through @PyTorchLightnin and torchdyn @Diffeq_ml . We cover NDE boilerplate, Lightning does all the rest! This is only the beginning: we are extending the ecosystem with + integration DiffEqML <-> PyL / @GridAI_
0
15
55
@MichaelPoli6
Michael Poli
3 months
There are many remarkable things about Evo, here are some of my thoughts:
@togethercompute
Together AI
3 months
Introducing Evo: a long-context biological model based on StripedHyena that generalizes across DNA, RNA, and proteins. It is capable of prediction tasks and generative design, from molecular to whole genome scale (over 650k tokens in length).
Tweet media one
8
79
361
3
15
50
@MichaelPoli6
Michael Poli
4 years
As our first NeurIPS experience, I have to say the results surpassed even the wildest of expectations. This is the culmination of a team effort with my dear friend @Massastrello , leading to @Diffeq_ml as an open-source effort for neural differential equations.
Tweet media one
5
5
51
@MichaelPoli6
Michael Poli
4 years
Neural ODE training can be difficult to get right. We find inspecting the adjoint flows can help in these situations. You can now easily access these quantities in torchdyn's models and log / visualize with your preferred @PyTorchLightnin utils.
3
10
51
@MichaelPoli6
Michael Poli
1 year
Hyena is a convolutional layer for LLMs that can shrink the gap with attention, while scaling *subquadratically* in seq len (eg train a lot faster @ 64k + train 100k+ tokens!) 2/ blogs: , code:
Tweet media one
1
4
38
@MichaelPoli6
Michael Poli
1 year
Check out our code here. We’d love to hear from you about more applications. Let’s push the limits of context lengths in lang, vision, bio and more! 12/n code:
4
4
37
@MichaelPoli6
Michael Poli
6 months
More excellent work on modernizing linear attn. / linear RNNs. Architecture design in 2024 is going to get even more sophisticated: we now have a variety of powerful "modern" primitives to choose from, each with different strengths.
@Yikang_Shen
Yikang Shen
6 months
Impressed by the performance of Mamba and believe in RNN? We provide a simple alternative solution! Excited to share Gated Linear Attention (GLA-Transformer). (1/n)
Tweet media one
6
66
389
3
5
33
@MichaelPoli6
Michael Poli
2 years
Join us for the second edition of the #NeurIPS2022 workshop "The Symbiosis of Deep Learning and Differential Equations"🌀 We're looking for your AI <> DE ideas: neural diff. eqs., neural operators, diffusion models and novel applications! website:
Tweet media one
1
13
33
@MichaelPoli6
Michael Poli
1 year
Armed w/ synthetic tasks, we honed in on what makes attn so special, narrowing down 3 key properties: 1. It’s data-controlled 2. has sublinear parameter scaling (in seq len) 3. global context. Hyena achieves all 3 w/ a combo of long convs & element-wise multiplicative gating. 5/
Tweet media one
2
5
29
@MichaelPoli6
Michael Poli
1 year
The code implementation is remarkably straightforward. Some exciting news for signal processing fans: we find filter parametrization (& custom input projections!) to be one of the most impactful design choices, & come up w/ some recommendations, see paper :) 6/
Tweet media one
1
1
24
@MichaelPoli6
Michael Poli
1 year
We started w/ great work on dense-attention-free models for language, eg H3!, which paired w/ few attn layers can match Transformers at 2.7B params. But how far can we get w/o any attn? On autoregressive language tasks (same tokenizer), we observed a gap in quality 3/
Tweet media one
1
0
22
@MichaelPoli6
Michael Poli
1 year
Just like attention, Hyena can be used in ViTs on ImageNet, suggesting mechanistic design benchmarks may help on perf beyond language. So excited for the potential of Hyena as a general deep learning operator especially in domains where long-range interactions are critical 10/
Tweet media one
1
1
21
@MichaelPoli6
Michael Poli
1 year
This work wouldn’t have been possible without inspiring progress on long sequence models, convolution parametrizations, mechanistic interpretability @realDanFu @srush_nlp @BlinkDL_AI @tri_dao @davidwromero @_albertgu @krandiash @NeelNanda5 , too many too list! You're great!
1
0
21
@MichaelPoli6
Michael Poli
3 years
Took a while. Get in touch if you're interested in contributing to open-source for neural diff eqs and implicit models! We have a lot of other interesting projects and collaborations underway.
@Diffeq_ml
DiffeqML
3 years
[1/6] Announcing **torchdyn version 1.0**: ! @MichaelPoli6 @Massastrello . We roughly doubled the number of tutorials (optimal control, parallel-in-time solvers, hybrid systems), added new models and developed a numerics suite for diff eqs and root finding
1
27
122
0
2
19
@MichaelPoli6
Michael Poli
6 months
Most importantly, I am very grateful to have been able to work with so many talented researchers and engineers on this, including collaborators from @HazyResearch ( @exnx , @ai_with_brains ), @NousResearch (thank you @theemozilla and @Teknium1 !), @Hessian_AI . Other individual
1
2
19
@MichaelPoli6
Michael Poli
2 months
We introduce a framework for fast prototyping and testing of new architecture designs, called mechanistic architecture design (MAD). MAD includes a collection of token manipulation tasks as unit tests of model capabilities: compression, recall, noisy recall, memorization... as
Tweet media one
1
0
18
@MichaelPoli6
Michael Poli
3 years
@PierreAblin Hey, at least they used a rainbow colormap :D
0
0
18
@MichaelPoli6
Michael Poli
2 months
How do I stripe my model? We find optimal hybridization ratios (Hyena - MHA): ~25% of layers should be attention (at sequence length 8k), <25% if trying to balance perplexity and state size of the model. And the ordering / topology? More on that in the paper
Tweet media one
1
0
17
@MichaelPoli6
Michael Poli
1 year
Building on the groundwork by H3 using synthetics for model design, we tried to simulate performance gaps on simple grokking tasks that take a few mins to run. Surprisingly, one can recreate gaps by tweaking difficulty of synthetic tasks (seq length & vocab size)! 4/
Tweet media one
1
0
17
@MichaelPoli6
Michael Poli
2 months
We use MAD to identify promising architectures, including striped and MoE variants. Then, perform an extensive compute-optimal (and beyond compute-optimal) scaling laws analysis of emerging architectures. Fun fact: the optimal allocation of tokens to model size varies
Tweet media one
1
1
16
@MichaelPoli6
Michael Poli
22 days
DSL for tile-based computation + Hopper arch. feature utilization for great out-of-the-box performance versus PyTorch and Triton. TK is quite fun to read and write, tons of potential applications in efficient model architectures!
@bfspector
Benjamin F Spector
22 days
(1/7) Happy mother’s day! We think what the mothers of America really want is a Flash Attention implementation that’s just 100 lines of code and 30% faster, and we’re happy to provide. We're excited to introduce ThunderKittens (TK), a simple DSL embedded within CUDA that makes
Tweet media one
19
161
887
0
1
16
@MichaelPoli6
Michael Poli
3 years
Come chat with us about future directions for Neural ODEs! (and DiffEqML)
@Diffeq_ml
DiffeqML
3 years
[1/2] NeurIPS is almost here! "Dissecting Neural ODEs" accepted at @NeurIPSConf #NeurIPS2020 as oral. Looking forward to your questions: Oral: Tue, Dec 8th, 2020 @ 06:30 – 06:45 PST Poster: Tue, Dec 8th, 2020 @ 09:00 – 11:00 PST (Session 2) Code:
Tweet media one
1
3
29
0
3
16
@MichaelPoli6
Michael Poli
1 year
On larger models, Hyena closes the gap with attention (GPT) in quality on the Pile and Wikitext103. 9/
Tweet media one
1
0
16
@MichaelPoli6
Michael Poli
1 year
With a way to measure progress on our small mechanistic design benchmarks, we refine the design of Hyena, and observe that particular parametrizations of the long convolutions scale more favorably in sequence length and vocabulary size 7/
Tweet media one
1
0
15
@MichaelPoli6
Michael Poli
11 months
Long context (1 million tokens) + "character"-level modeling in genomics. Very fun project, a lot more to learn on training these new architectures.
0
3
15
@MichaelPoli6
Michael Poli
1 year
On The Pile, we see the performance gaps with Transformers start to close, given a fixed FLOP budget. (Hyenas are crafty creatures: they don't leave perplexity points on the table!) 8/
Tweet media one
1
0
15
@MichaelPoli6
Michael Poli
3 years
The deadline for our #NeurIPS2021 workshop has been moved. More time to refine your submissions at the intersection of learning and differential equations! New submission deadline: September 24th
Tweet media one
0
3
15
@MichaelPoli6
Michael Poli
2 years
Join us today Wed 30 at J #431 for fractals and collage representations () and J #123 for learning in frequency domain, neural operators and long-range dependencies (). Both at 11.00am - 1.00pm, catch me bouncing between posters!
1
6
15
@MichaelPoli6
Michael Poli
2 months
Scaling laws on DNA pretraining? New sparsely gated layers such as Hyena experts? Check the paper for more! We open-source the MAD pipeline for anyone to test their architectures!
1
1
15
@MichaelPoli6
Michael Poli
3 years
Happy to share our latest research #NeurIPS2021 Multiple Shooting Layers (MSL): new parallel-in-time, implicit model that achieves speedups via parallelization and solution reuse. Neural Hybrid Automata (NHA): learning stochastic, multi-mode hybrid systems.
@Massastrello
Stefano Massaroli
3 years
Two papers accepted at #NeurIPS2021 : Differentiable Multiple Shooting Layers Neural Hybrid Automata @MichaelPoli6 @btreetaiji @animesh_garg
1
3
31
0
0
14
@MichaelPoli6
Michael Poli
2 months
As always, this was only possible because of awesome collaborators and friends, @exnx @ai_with_brains @Massastrello @smartprakas @ce_zhang @StefanoErmon @HazyResearch @BrianHie ... and many more Some folks that might find this interesting: @BlinkDL_AI use this for RWKV-Next!
1
0
14
@MichaelPoli6
Michael Poli
4 years
@Massastrello @Diffeq_ml @samgreydanus [3/4] While significant progress is being made by Julia devs and SciML @ChrisRackauckas , we believe a continuous NN library for PyTorch to be of value to our research ecosystem. We leverage PyTorch Lightning's @_willfalcon sweet API to handle training loops.
1
4
13
@MichaelPoli6
Michael Poli
2 years
❗New deadline for the #NeurIPS2022 workshop❗ "Symbiosis of Deep Learning and Differential Equations": October 1st. Website: . Send us your work on neural differential equations, learnable numerical methods, continuous-time diffusion and more!
Tweet media one
0
3
11
@MichaelPoli6
Michael Poli
10 months
At ICML! Convention center is a great place for walks & chats, DMs open :)
@exnx
Eric Nguyen
11 months
Heading to Honolulu for ICML now! Come talk to us about Hyena (or HyenaDNA) at our poster session :) Poster: Wed 26th at 2pm HST. Oral talk: Thurs 348pm HST I’m here all week, feel free to reach out. Looking forward to all the great research chatter! @MichaelPoli6
Tweet media one
1
11
32
0
1
11
@MichaelPoli6
Michael Poli
11 months
At ICML soon! Happy to chat about all things LLM training, efficient (alternative) architectures, long context, signal processing and dynamical systems!
0
1
11
@MichaelPoli6
Michael Poli
2 years
[1/2] Robustness work often starts with motivations "is the duck classified as a duck because of duck cues or because it is often paired with water backgrounds, and the model picks up on that instead?"
@SanghyukChun
SanghyukChun
2 years
Another great news: our recent paper on analysis of the shortcut learning problem is accepted at ICLR 2022! We answer "why is color always preferred by DNNs?" It is a very interesting paper, and worth reading it :^) Congrats to all authors @ScimecaLuca @coallaoh @MichaelPoli6
0
4
30
1
3
11
@MichaelPoli6
Michael Poli
2 months
A key feature of emerging architectures is the fact that they have a fixed state size for autoregressive inference. We study the total state size of recurrent and striped models in a compute-optimal regime, finding interesting trade-offs between homogeneous and striped models.
Tweet media one
1
0
11
@MichaelPoli6
Michael Poli
2 years
This is a big deal for the NDE community. JAX, PyTorch and Julia are now all supported to various degrees; looking forward to seeing even more applications!
@PatrickKidger
Patrick Kidger
2 years
⭐️Announcing Diffrax!⭐️ Numerical differential equation solvers in #JAX . Very efficient, and with oodles of fun features! GitHub: Docs: Install: `pip install diffrax` 🧵 1/n
Tweet media one
6
95
449
0
1
11
@MichaelPoli6
Michael Poli
1 year
@BlancheMinerva Thanks @BlancheMinerva ! Compute is the main bottleneck right now. We are working on some lower-level optimizations that will hopefully make scaling easier for everyone.
2
0
11
@MichaelPoli6
Michael Poli
4 years
@Massastrello @Diffeq_ml @RickyTQChen @wgrathwohl @DavidDuvenaud @MilesCranmer @samgreydanus @_willfalcon @chuckberryfinn [3/5] Higher-order models @alexnorcliffe98 @CristianBodnar have also been further developed with the goal of preserving compatibility with all other variants available. Here is an example of a tenth-order Neural ODE trained on a classification task!
Tweet media one
2
0
10
@MichaelPoli6
Michael Poli
3 months
Tweet media one
1
3
10
@MichaelPoli6
Michael Poli
2 years
[2/2] "Differentiable Multiple Shooting Layers" Implicit, parallel-in-time models. We investigate how to perform fast inference via tracking of solutions across training iterations! Poster: Thu Dec 09 08:30 AM -- 10:00 AM
1
4
10
@MichaelPoli6
Michael Poli
3 years
@Diffeq_ml @Massastrello [2/6] Our vision for torchdyn is to become the torchvision/audio for diff eqs and implicit models. There is no better time to get involved! Below: Multiple Shooting Layers as implicit, parallel-in-time Neural ODEs. Speed ups via solution reuse across iterations!
1
0
9
@MichaelPoli6
Michael Poli
2 years
@winniethexu @Massastrello Far more information can be found on our website: Thank you to all our amazing coauthors @winniethexu @Massastrello @chenlin_meng @_kunoai @StefanoErmon 6/
0
0
9
@MichaelPoli6
Michael Poli
2 months
@BlancheMinerva The repo is for the MAD pipeline, not scaling laws. We have a bunch of checkpoints around, will release if there's enough interest.
3
0
8
@MichaelPoli6
Michael Poli
6 months
StripedHyena 7B is a hybrid architecture based on our latest on scaling laws or alternative architectures and fast inference with gated convolutions. It's a longer context model, with strong performance across a variety of standard language benchmarks, with 50% smaller caches,
Tweet media one
Tweet media two
Tweet media three
3
0
8
@MichaelPoli6
Michael Poli
2 months
How is MAD predictive of scaling laws performance? We study correlation between aggregate task performance and compute-optimal perplexity, and find strong correlation at all scales, particularly in models of a similar base class. We use this fact to iteratively improve the
Tweet media one
1
1
8
@MichaelPoli6
Michael Poli
1 year
Great to see! A win for efficient alternatives to Transformers for long sequences
@magicailabs
Magic.dev
1 year
Meet LTM-1: LLM with *5,000,000 prompt tokens* That's ~500k lines of code or ~5k files, enough to fully cover most repositories. LTM-1 is a prototype of a neural network architecture we designed for giant context windows.
48
186
1K
1
1
8
@MichaelPoli6
Michael Poli
7 months
Stoked about this work! >60 utilization for gated convolutions makes new architectures even more compelling as Transformer replacements, with faster e2e training at shorter AND longer sequences.
@realDanFu
Dan Fu
7 months
Announcing FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores! We speed up exact FFT convolutions by up to 7.93x over PyTorch, reduce memory footprint, and get 4.4x speedup end-to-end. Read on for more details: Thanks @arankomatsuzaki and @_akhaliq for
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
74
384
1
2
8
@MichaelPoli6
Michael Poli
2 years
@alfcnz We're honored! To be fair, it looked better in person🤷
1
0
7
@MichaelPoli6
Michael Poli
4 years
We're particularly excited about the opportunity to share some of our research as an oral presentation! @Massastrello will share more about "Dissecting Neural ODEs" the coming days. I suspect we'll never get another paper at NeurIPS after exhausting our pool of luck this year :D
0
0
7
@MichaelPoli6
Michael Poli
3 months
@togethercompute Try on the Together API: Here is an example notebook: Prompting and sampling is different from chat models. To start generating, we recommend turning off temperature and top p (temperature 1, top p 1, top k 4).
1
1
6
@MichaelPoli6
Michael Poli
3 years
@Massastrello @Diffeq_ml [4/6] Although we are planning to remain in PyTorch for torchdyn, with further integration with torchcde @PatrickKidger and torchsde @lxuechen , we have long-term plans to extend the DiffEqML ecosystem to JAX and Julia.
1
0
6
@MichaelPoli6
Michael Poli
4 years
This also means we'll have more bandwidth for development, (including some of the methods in the papers above). We're still committed to providing a complete neural diff. equations API for PyTorch / @PyTorchLightnin , and we have some new additions coming!
0
1
6
@MichaelPoli6
Michael Poli
4 years
@Massastrello @Diffeq_ml @RickyTQChen @wgrathwohl @DavidDuvenaud @MilesCranmer @samgreydanus @_willfalcon @chuckberryfinn @alexnorcliffe98 @CristianBodnar [4/5] Here is our feature roadmap. We welcome any support from the deep learning community: code, tutorials and documentation. Next patch will include latent and hybrid NDEs, autoregressive graph DEs (GDEs) and develop the infrastructure needed to finally support Neural SDEs.
Tweet media one
2
0
6
@MichaelPoli6
Michael Poli
6 months
@NousResearch @togethercompute You guys are awesome, looking forward to shipping more open models together
0
0
5
@MichaelPoli6
Michael Poli
7 months
Enjoy! Excited to see what the open-source community builds on top of this.
@togethercompute
Together AI
7 months
We are excited to release RedPajama-Data-v2: 30 trillion filtered & de-duplicated tokens from 84 CommonCrawl dumps, 25x larger than our first dataset. It exposes a diverse range of quality annotations so you can slice & weight the data for LLM training.
Tweet media one
20
287
1K
0
1
5
@MichaelPoli6
Michael Poli
3 years
Great job everyone! Looking forward to many great contributions and to fun chats about the intersection of differential equations and deep learning.
@PatrickKidger
Patrick Kidger
3 years
Our 'The Symbiosis of Deep Learning and Differential Equations' workshop has been accepted for #NeurIPS2021 ! Send us your work on data-driven dynamical systems, neural differential equations, solving PDEs with deep learning etc. Tentative submission deadline Sept. 17.
4
51
307
0
0
5
@MichaelPoli6
Michael Poli
3 years
@shoyer @GoogleAI Cool work! Working on residuals seems to be an effective way to go for these solver-neural net hybrids -- we found similar gains on a different set of tasks with hypersolvers () @Massastrello
3
1
4
@MichaelPoli6
Michael Poli
7 months
The march toward more efficient architectures can't be stopped! A neat parametrization based on Monarch matrices:
@realDanFu
Dan Fu
7 months
Excited about models that are sub-quadratic in sequence length and model dimension? Our Monarch Mixer paper is now on arXiv -- and super excited to present it as an oral at #NeurIPS2023 ! Let's dive in to what's new with the paper and the new goodies from this release: Monarch
Tweet media one
Tweet media two
Tweet media three
Tweet media four
4
60
293
0
0
5
@MichaelPoli6
Michael Poli
3 years
@DanielePanozzo @NeurIPSConf Yes! That's also why hybrid is a promising way forward Nice work. Any reason for not including a benchmark dedicated for high stiffness (e.g Robertson)? Or alternatively tuning parameters of NS and other systems to gradually increase stiffness.
1
0
5
@MichaelPoli6
Michael Poli
3 years
@unsorsodicorda We've been having fun with the experimental version and it has been very useful for some of our new stuff. Definitely brings PyTorch one step closer to JAX on that front (though there are still limitations of course)...
1
0
5
@MichaelPoli6
Michael Poli
3 years
@Massastrello @Diffeq_ml @PatrickKidger @lxuechen @ChrisRackauckas [6/6] For those interested in contributing, here's a list: . In the short term, we should have more for Deep Equilibrium Models and variants @shaojieb , Score-Matching SDE @YSongStanford and reg terms @jacobjinkelly . We have prototypes for all of the above
3
0
5
@MichaelPoli6
Michael Poli
4 years
@Massastrello @DavidDuvenaud @chuckberryfinn [7/n] This work is part of a *vast* literature on neural network differential equation solvers, though our focus is on Neural ODEs and their interplay with the solver. The code will be released soon as part of a research section of the *torchdyn* library:
1
0
5
@MichaelPoli6
Michael Poli
4 years
And some cool vector field symmetries organically emerge!
@Massastrello
Stefano Massaroli
4 years
We finally got around to open-sourcing more Neural ODE variants in the "torchdyn" library , including our latest "stacked neural ODEs" aka continuous-depth models with piece-wise constant parameters. @MichaelPoli6
0
6
36
0
0
5
@MichaelPoli6
Michael Poli
3 months
Scaling laws are different on DNA data at nucleotide resolution (and more broadly, on sequences at byte resolution). Scaling laws seem to hold strong (both on and off the compute-optimal frontier), so I am excited to see what larger models could do.
Tweet media one
1
0
4
@MichaelPoli6
Michael Poli
5 years
@jeremyphoward Bagnets, how do they work?
0
0
4
@MichaelPoli6
Michael Poli
1 year
1
0
4
@MichaelPoli6
Michael Poli
6 months
@wait_sasha @togethercompute All released under Apache 2.0, have fun!
1
0
4
@MichaelPoli6
Michael Poli
7 months
Thank you @carrigmat for the awesome work! Try HyenaDNA out on the Hub!
@carrigmat
Matthew Carrigan
7 months
Big genomics news today at @huggingface : We're delighted to welcome HyenaDNA to the Hub! Models: Paper: Thanks to @HazyResearch @exnx @MichaelPoli6 @marjanfaizi for the model, and for your work on the port! More info in 🧵
2
13
72
0
1
4
@MichaelPoli6
Michael Poli
6 months
@citre_piotto @_albertgu @tri_dao @tri_dao is at @togethercompute , working on exciting research such as this :)
2
0
3
@MichaelPoli6
Michael Poli
2 years
@danrothenberg Hopefully things will change rapidly now that DeepMind (climate team), Microsoft (AI4Science), NVIDIA & more are shifting some of their focus towards deep learning climate and weather.
2
0
3
@MichaelPoli6
Michael Poli
2 months
@BlancheMinerva I understand what you're trying to say, but it does matter - these are architectures that are not fully supported on HF, so a lot of additional work needs to be done to ensure people can run inference easily enough to work on them. Hence, interest (we have released checkpoints
1
0
2
@MichaelPoli6
Michael Poli
3 years
@mmbronstein @b_p_chamberlain @migorinova @stefan_webb @emaros96 Cool to see you also got pretty good results on Cora - Citeseer - Pubmed with GDEs! FYI, we're releasing an extended version of the original with SDE + GNN for dynamic graphs and a latent var model which might be of interest to some of you.
1
0
3
@MichaelPoli6
Michael Poli
3 years
@mmbronstein @b_p_chamberlain @migorinova @stefan_webb @emaros96 I would personally be interested in seeing this line of work see further development in dynamic graphs as it is a challenging problem and not many working on it atm.
2
0
3
@MichaelPoli6
Michael Poli
4 years
@PatrickKidger Well deserved. I bet we'll see a lot of Neural CDE / RDE papers going forward
2
0
3
@MichaelPoli6
Michael Poli
3 months
Evo represents the culmination of a long line of research on deep signal processing: new layer primitives, architecture topologies, scaling law analysis, initialization schemes, custom inference algorithms.
1
0
3
@MichaelPoli6
Michael Poli
5 years
Excited to finally share our work on Graph Neural Ordinary Differential Equations (GDEs)! blog post:
1
0
3
@MichaelPoli6
Michael Poli
2 years
Here we study why certain cues are intrinsically preferred by ERM, irrespective of their dataset frequencies (we normalize their predictive power by ensuring a task can be solved with any single cue alone). Interesting takeaways i.e. color cues always dominate in visual tasks.
0
0
3