
Sean Zhang
@seeeeaaaannnnnn
Followers
531
Following
134
Media
4
Statuses
34
Training Neural Network @Manifest__AI, Ex-Meta & Ex-Voleon, Longtermism
Canada
Joined January 2016
The massive performance upgrades of power attention are clearly visible at the 1.6B parameter scale on 32k-length documents. The improvement is due to better in-context learning.
2
8
131
Balance is the key. We show, in our latest work, that both quadratic attention and linear-attention-based architectures are not fit for long context jobs because they spend way too much of their flops budget on either state or weight.
The age of transformers is ending...the dawn of linear-cost architectures is upon us. Power Attention replaces Flash Attention in any transformer, and removes the quadratic penalty of context scaling while achieving strong performance. The result: domination of both transformers
0
2
6
🥳
Had a great time talking Power Attention with the amazing folks at @GoogleDeepMind Montreal. Thanks @pcastr, Adrien, and Zhitao for hosting us!
0
0
0
Towards understanding DNN from first principles
The day I learned about gradient descent I thought "this cannot possibly work, it will get stuck in local minima" Only now, a decade later, do feel like I understand why GD works so well with NNs Our new article explains how to prove convergence to low loss without convexity 1/n
0
1
5
Excited to share our latest work @manifest__ai on a new linear transformer architecture that has potential to outperform standard softmax transformers! Check it out here
manifestai.com
A linear transformer that learns like a regular transformer with a state that fits on a GPU.
Excited to share some of our latest work @manifest__ai on symmetric power transformers, which are a linear transformer variant that seems hugely promising. It gets performance on par with (or better than!) a softmax transformer, but can be trained O(t) instead of O(t^2). 1/8
0
0
1
Thank you for the tutorial!
📢 Slides for the #AAAI2024 tutorial on the role of LLMs in Planning are now available.. 👉 https://t.co/md6YmFNaTJ See y'all in the afternoon in Room 114.
0
0
1
Great to see AI technological advances is providing value to foundational research as well. Terence Tao on "Machine Assistant Proof": https://t.co/1ugZgbuXTf
0
0
0
Convinced wee are witnessing the birth of a new kind of computer. From: Memorizing Transformers https://t.co/ou5oXr9lp0
23
144
2K
In 2008 the banks got rich, went bust, and got bailed out. It was unfair, so we created a regulatory system to prevent it from happening. SVB is a minor bank. We could have let the process play out and show how the system has been improved. But it seems nothing has changed
289
1K
6K
These women handing out water and food to migrants traveling through Mexico on their way to the United States Imagine living out the life experience of basic survival and one day deciding to set off for the potential of something better, anything better, aboard La Bestia, also
193
439
3K
This is amazing. Due to the backlash from Chinese fans seeing unmasked crowds in Qatar, Chinese TV is now replacing live crowds shots during games and instead cutting to close-ups of players and coaches.
186
3K
10K
What happens when you ask an AI to generate "Human Evolution." This is terrifying.
4K
37K
224K
Rental Market Tracker: Rents Are Growing Half as Fast as They Were Six Months Ago
redfin.com
Rents rose 9% year over year in September—the first single-digit increase in a year and a marked slowdown from 18% growth in March.
1
1
13
GenZ job expectations 2020: day trader 2021: flippin' burgers at Wendy's
2
4
26
Around 364 million online profiles and their chats & file transfers get processed daily. Then these accounts get linked to a real ID/person. The data is then distributed over police stations per city/province to separate operators databases with the same surveillance network name
15
258
416