
tensorqt
@tensorqt
Followers
3K
Following
26K
Media
576
Statuses
6K
chaos dancing star
Icecrown Citadel
Joined February 2022
attention sinks may be a bias in causal transformers. as some of you know, i've been writing a long blogpost on attention and its properties as a message-passing operation on graphs. while doing so, i figured i might have found an explanation for which attention sinks may be an
35
98
944
"we may never know, Uther, I intend to live forever.".
Hot-mic moment at the Beijing parade. Xi: “People rarely lived past 70 before. Now at 70 you’re still a child.”. Putin: “With biotech, organs can be replaced endlessly… people could even reach immortality.”. Xi: “Some predict people might live to 150 this century.”
1
0
8
very proud to see one of my earliest moots now be one of the leading faces of the ML scene.
The Lore of Kalomaze! ⚡️. bringing a great pod with @kalomaze (20yo ml researcher, prime intellect) - we'd talked about training, finetuning, RL (environments and recipes), scaling, working at PI and a Lot of Lores!. (link in replies)
2
1
62
RT @himanshustwts: The Lore of Kalomaze! ⚡️. bringing a great pod with @kalomaze (20yo ml researcher, prime intellect) - we'd talked about….
0
38
0
i am somewhat growing skeptical of looping layers as an architectural strategy (as some of you may know, i've been quite a fan). Something is still missing imo, probably a combination of:.- a proper experimental demonstration of looping being worth the FLOPs and .- a hierarchical.
Another active stream of Language Modeling literature investigates whether, and how, one can adapt a pretrained model to perform better on a given task, without any additional continued-pretraining, nor fine-tuning. At the current stage, two ideas have emerged: layer-pruning to
6
0
30
RT @Niccolg92: Another active stream of Language Modeling literature investigates whether, and how, one can adapt a pretrained model to per….
0
5
0
not sure everyone here had caught on on my bio.
you must have chaos.you must have chaos within you.you chaos within you.chaos give birth to a dancing star.must birth a dancing star.you to give birth to a star. one beautiful saying, many embedded messages. 'you must have chaos within you to give birth to a dancing star' - ntzch.
1
0
7
RT @Dorialexander: While we’re still wondering if there could be more than a handful of labs in Europe, even the local Chinese DoorDash is….
0
40
0
RT @torchcompiled: New post! The fact that we experience life through what feels like a singular entity, I believe, is chance adaptation ra….
0
5
0
some really interesting work done in the last few years by @DonatoCrisosto1. also containing some concepts linked to some really interesting directions we're cooking rn.
Starting to think about my thesis on model merging and worried no one will read it… so I wrote a blog post instead💡. Now you can skip both 🚀
2
0
24
RT @leothecurious: > self dox.> check my real talk on youtube.> clicks link.> speaker is actually the lich king.> knew_it.png https://t.co/….
0
4
0
given that i've basically dropped my anonimity (which wasn't really meant to be strong in the first place) with the latest blogpost, i feel more comfortable sharing this short talk of mine at AI Tinkerers Milan, hosted by @alxfazio and @AdeccoGroupITA , discussing some early.
11
3
123
RT @dejavucoder: my blogpost "can LLMs dream of electric sheep" is up now! it's a fun experiment where i ask LLMs for a creative visual pro….
0
11
0