tensorqt Profile Banner
tensorqt Profile
tensorqt

@tensorqt

Followers
3K
Following
26K
Media
576
Statuses
6K

chaos dancing star

Icecrown Citadel
Joined February 2022
Don't wanna be here? Send us removal request.
@tensorqt
tensorqt
11 days
attention sinks may be a bias in causal transformers. as some of you know, i've been writing a long blogpost on attention and its properties as a message-passing operation on graphs. while doing so, i figured i might have found an explanation for which attention sinks may be an
Tweet media one
35
98
944
@tensorqt
tensorqt
1 day
"we may never know, Uther, I intend to live forever.".
@clashreport
Clash Report
1 day
Hot-mic moment at the Beijing parade. Xi: “People rarely lived past 70 before. Now at 70 you’re still a child.”. Putin: “With biotech, organs can be replaced endlessly… people could even reach immortality.”. Xi: “Some predict people might live to 150 this century.”
1
0
8
@tensorqt
tensorqt
1 day
very proud to see one of my earliest moots now be one of the leading faces of the ML scene.
@himanshustwts
himanshu
1 day
The Lore of Kalomaze! ⚡️. bringing a great pod with @kalomaze (20yo ml researcher, prime intellect) - we'd talked about training, finetuning, RL (environments and recipes), scaling, working at PI and a Lot of Lores!. (link in replies)
2
1
62
@tensorqt
tensorqt
1 day
RT @himanshustwts: The Lore of Kalomaze! ⚡️. bringing a great pod with @kalomaze (20yo ml researcher, prime intellect) - we'd talked about….
0
38
0
@tensorqt
tensorqt
2 days
i am somewhat growing skeptical of looping layers as an architectural strategy (as some of you may know, i've been quite a fan). Something is still missing imo, probably a combination of:.- a proper experimental demonstration of looping being worth the FLOPs and .- a hierarchical.
@Niccolg92
Niccolo' Gentile
2 days
Another active stream of Language Modeling literature investigates whether, and how, one can adapt a pretrained model to perform better on a given task, without any additional continued-pretraining, nor fine-tuning. At the current stage, two ideas have emerged: layer-pruning to
Tweet media one
6
0
30
@tensorqt
tensorqt
2 days
RT @Niccolg92: Another active stream of Language Modeling literature investigates whether, and how, one can adapt a pretrained model to per….
0
5
0
@tensorqt
tensorqt
3 days
RT @main_horse: μtransfer for Mamba2 & Muon
Tweet media one
0
23
0
@tensorqt
tensorqt
3 days
not sure everyone here had caught on on my bio.
@attentionmech
am
3 days
you must have chaos.you must have chaos within you.you chaos within you.chaos give birth to a dancing star.must birth a dancing star.you to give birth to a star. one beautiful saying, many embedded messages. 'you must have chaos within you to give birth to a dancing star' - ntzch.
1
0
7
@tensorqt
tensorqt
4 days
RT @Dorialexander: While we’re still wondering if there could be more than a handful of labs in Europe, even the local Chinese DoorDash is….
0
40
0
@tensorqt
tensorqt
4 days
RT @torchcompiled: New post! The fact that we experience life through what feels like a singular entity, I believe, is chance adaptation ra….
0
5
0
@tensorqt
tensorqt
5 days
egirls on the tl wishing they were me rn
Tweet media one
2
0
30
@tensorqt
tensorqt
5 days
this is very very true. i think easiest example is when you are being hosted by a friend: a male friend will throw a mattress on the ground and that's where you're gonna sleep. any woman will treat you like a PRINCE and make hotels pale in comparison.
@bayeslord
bayes
5 days
how do women have such a good ui.
3
0
16
@tensorqt
tensorqt
6 days
> fading fever.> thunderstorm outside.> mcdonalds.> psytrance.> tight deadline . locking in
0
0
26
@tensorqt
tensorqt
7 days
Kalo did nothing wrong.
6
0
49
@tensorqt
tensorqt
8 days
some really interesting work done in the last few years by @DonatoCrisosto1. also containing some concepts linked to some really interesting directions we're cooking rn.
@DonatoCrisosto1
Donato Crisostomi
8 days
Starting to think about my thesis on model merging and worried no one will read it… so I wrote a blog post instead💡. Now you can skip both 🚀
Tweet media one
2
0
24
@tensorqt
tensorqt
8 days
RT @zmkzmkz: PREPRINT:.Predicting the Order of Upcoming Tokens Improves Language Modeling. Instead of just predicting the next token, what….
0
87
0
@tensorqt
tensorqt
8 days
RT @leothecurious: > self dox.> check my real talk on youtube.> clicks link.> speaker is actually the lich king.> knew_it.png https://t.co/….
0
4
0
@tensorqt
tensorqt
8 days
given that i've basically dropped my anonimity (which wasn't really meant to be strong in the first place) with the latest blogpost, i feel more comfortable sharing this short talk of mine at AI Tinkerers Milan, hosted by @alxfazio and @AdeccoGroupITA , discussing some early.
11
3
123
@tensorqt
tensorqt
8 days
if you have overthinking tendencies, try a combat sport.
1
0
9
@tensorqt
tensorqt
8 days
this is SF andrew tate, will not elaborate further.
@im_roy_lee
Roy
8 days
Course live now. $4999 for the next 24 hours. cluely . university
12
1
289
@tensorqt
tensorqt
9 days
RT @dejavucoder: my blogpost "can LLMs dream of electric sheep" is up now! it's a fun experiment where i ask LLMs for a creative visual pro….
0
11
0