Sean Zhang Profile
Sean Zhang

@seeeeaaaannnnnn

Followers
531
Following
134
Media
4
Statuses
34

Training Neural Network @Manifest__AI, Ex-Meta & Ex-Voleon, Longtermism

Canada
Joined January 2016
Don't wanna be here? Send us removal request.
@jacobmbuckman
Jacob Buckman
2 months
The massive performance upgrades of power attention are clearly visible at the 1.6B parameter scale on 32k-length documents. The improvement is due to better in-context learning.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
8
131
@seeeeaaaannnnnn
Sean Zhang
2 months
Balance is the key. We show, in our latest work, that both quadratic attention and linear-attention-based architectures are not fit for long context jobs because they spend way too much of their flops budget on either state or weight.
Tweet media one
@jacobmbuckman
Jacob Buckman
2 months
The age of transformers is ending...the dawn of linear-cost architectures is upon us. Power Attention replaces Flash Attention in any transformer, and removes the quadratic penalty of context scaling while achieving strong performance. The result: domination of both transformers
Tweet media one
0
2
6
@seeeeaaaannnnnn
Sean Zhang
10 months
🥳
@manifest__ai
Manifest AI
10 months
Had a great time talking Power Attention with the amazing folks at @GoogleDeepMind Montreal. Thanks @pcastr, Adrien, and Zhitao for hosting us!
Tweet media one
Tweet media two
Tweet media three
0
0
0
@seeeeaaaannnnnn
Sean Zhang
1 year
Towards understanding DNN from first principles
@carlesgelada
Carles Gelada
1 year
The day I learned about gradient descent I thought "this cannot possibly work, it will get stuck in local minima" Only now, a decade later, do feel like I understand why GD works so well with NNs Our new article explains how to prove convergence to low loss without convexity 1/n
0
1
5
@seeeeaaaannnnnn
Sean Zhang
1 year
Excited to share our latest work @manifest__ai on a new linear transformer architecture that has potential to outperform standard softmax transformers! Check it out here
Tweet card summary image
manifestai.com
A linear transformer that learns like a regular transformer with a state that fits on a GPU.
@jacobmbuckman
Jacob Buckman
1 year
Excited to share some of our latest work @manifest__ai on symmetric power transformers, which are a linear transformer variant that seems hugely promising. It gets performance on par with (or better than!) a softmax transformer, but can be trained O(t) instead of O(t^2). 1/8
0
0
1
@seeeeaaaannnnnn
Sean Zhang
2 years
Thank you for the tutorial!
@rao2z
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
2 years
📢 Slides for the #AAAI2024 tutorial on the role of LLMs in Planning are now available.. 👉 https://t.co/md6YmFNaTJ See y'all in the afternoon in Room 114.
Tweet media one
0
0
1
@seeeeaaaannnnnn
Sean Zhang
2 years
Great to see AI technological advances is providing value to foundational research as well. Terence Tao on "Machine Assistant Proof": https://t.co/1ugZgbuXTf
Tweet media one
Tweet media two
0
0
0
@amasad
Amjad Masad
2 years
Convinced wee are witnessing the birth of a new kind of computer. From: Memorizing Transformers https://t.co/ou5oXr9lp0
Tweet media one
23
144
2K
@FedGuy12
Joseph Wang
3 years
In 2008 the banks got rich, went bust, and got bailed out. It was unfair, so we created a regulatory system to prevent it from happening. SVB is a minor bank. We could have let the process play out and show how the system has been improved. But it seems nothing has changed
Tweet media one
289
1K
6K
@runews
Russian Market
3 years
These women handing out water and food to migrants traveling through Mexico on their way to the United States Imagine living out the life experience of basic survival and one day deciding to set off for the potential of something better, anything better, aboard La Bestia, also
193
439
3K
@seeeeaaaannnnnn
Sean Zhang
3 years
Interesting paper
Tweet media one
0
0
0
@DreyerChina
Mark Dreyer
3 years
This is amazing. Due to the backlash from Chinese fans seeing unmasked crowds in Qatar, Chinese TV is now replacing live crowds shots during games and instead cutting to close-ups of players and coaches.
186
3K
10K
@CountereCulture
Countere Magazine
3 years
What happens when you ask an AI to generate "Human Evolution." This is terrifying.
4K
37K
224K
@TaylorAMarr
Taylor Marr
3 years
Rental Market Tracker: Rents Are Growing Half as Fast as They Were Six Months Ago
Tweet card summary image
redfin.com
Rents rose 9% year over year in September—the first single-digit increase in a year and a marked slowdown from 18% growth in March.
1
1
13
@TheMarketDog
The Market Dog
5 years
GenZ job expectations 2020: day trader 2021: flippin' burgers at Wendy's
2
4
26
@0xDUDE
Victor Gevers
7 years
Around 364 million online profiles and their chats & file transfers get processed daily. Then these accounts get linked to a real ID/person. The data is then distributed over police stations per city/province to separate operators databases with the same surveillance network name
15
258
416