tensorqt @tensorqt X Profile

tensorqt

@tensorqt

Followers

3K

Following

26K

Media

576

Statuses

6K

chaos dancing star

Icecrown Citadel

Joined February 2022

Don't wanna be here? Send us removal request.

tensorqt

@tensorqt

11 days

attention sinks may be a bias in causal transformers. as some of you know, i've been writing a long blogpost on attention and its properties as a message-passing operation on graphs. while doing so, i figured i might have found an explanation for which attention sinks may be an

35

98

944

tensorqt

@tensorqt

1 day

"we may never know, Uther, I intend to live forever.".

Clash Report

@clashreport

1 day

Hot-mic moment at the Beijing parade. Xi: “People rarely lived past 70 before. Now at 70 you’re still a child.”. Putin: “With biotech, organs can be replaced endlessly… people could even reach immortality.”. Xi: “Some predict people might live to 150 this century.”

1

0

8

tensorqt

@tensorqt

1 day

very proud to see one of my earliest moots now be one of the leading faces of the ML scene.

himanshu

@himanshustwts

1 day

The Lore of Kalomaze! ⚡️. bringing a great pod with @kalomaze (20yo ml researcher, prime intellect) - we'd talked about training, finetuning, RL (environments and recipes), scaling, working at PI and a Lot of Lores!. (link in replies)

2

1

62

tensorqt

@tensorqt

1 day

RT @himanshustwts: The Lore of Kalomaze! ⚡️. bringing a great pod with @kalomaze (20yo ml researcher, prime intellect) - we'd talked about….

0

38

0

tensorqt

@tensorqt

2 days

i am somewhat growing skeptical of looping layers as an architectural strategy (as some of you may know, i've been quite a fan). Something is still missing imo, probably a combination of:.- a proper experimental demonstration of looping being worth the FLOPs and .- a hierarchical.

Niccolo' Gentile

@Niccolg92

2 days

Another active stream of Language Modeling literature investigates whether, and how, one can adapt a pretrained model to perform better on a given task, without any additional continued-pretraining, nor fine-tuning. At the current stage, two ideas have emerged: layer-pruning to

6

0

30

tensorqt

@tensorqt

2 days

RT @Niccolg92: Another active stream of Language Modeling literature investigates whether, and how, one can adapt a pretrained model to per….

0

5

0

tensorqt

@tensorqt

3 days

RT @main_horse: μtransfer for Mamba2 & Muon

0

23

0

tensorqt

@tensorqt

3 days

not sure everyone here had caught on on my bio.

am

@attentionmech

3 days

you must have chaos.you must have chaos within you.you chaos within you.chaos give birth to a dancing star.must birth a dancing star.you to give birth to a star. one beautiful saying, many embedded messages. 'you must have chaos within you to give birth to a dancing star' - ntzch.

1

0

7

tensorqt

@tensorqt

4 days

RT @Dorialexander: While we’re still wondering if there could be more than a handful of labs in Europe, even the local Chinese DoorDash is….

0

40

0

tensorqt

@tensorqt

4 days

RT @torchcompiled: New post! The fact that we experience life through what feels like a singular entity, I believe, is chance adaptation ra….

0

5

0

tensorqt

@tensorqt

5 days

egirls on the tl wishing they were me rn

2

0

30

tensorqt

@tensorqt

5 days

this is very very true. i think easiest example is when you are being hosted by a friend: a male friend will throw a mattress on the ground and that's where you're gonna sleep. any woman will treat you like a PRINCE and make hotels pale in comparison.

bayes

@bayeslord

5 days

how do women have such a good ui.

3

0

16

tensorqt

@tensorqt

6 days

> fading fever.> thunderstorm outside.> mcdonalds.> psytrance.> tight deadline . locking in

0

26

tensorqt

@tensorqt

7 days

Kalo did nothing wrong.

6

0

49

tensorqt

@tensorqt

8 days

some really interesting work done in the last few years by @DonatoCrisosto1. also containing some concepts linked to some really interesting directions we're cooking rn.

Donato Crisostomi

@DonatoCrisosto1

8 days

Starting to think about my thesis on model merging and worried no one will read it… so I wrote a blog post instead💡. Now you can skip both 🚀

2

0

24

tensorqt

@tensorqt

8 days

RT @zmkzmkz: PREPRINT:.Predicting the Order of Upcoming Tokens Improves Language Modeling. Instead of just predicting the next token, what….

0

87

0

tensorqt

@tensorqt

8 days

RT @leothecurious: > self dox.> check my real talk on youtube.> clicks link.> speaker is actually the lich king.> knew_it.png https://t.co/….

0

4

0

tensorqt

@tensorqt

8 days

given that i've basically dropped my anonimity (which wasn't really meant to be strong in the first place) with the latest blogpost, i feel more comfortable sharing this short talk of mine at AI Tinkerers Milan, hosted by @alxfazio and @AdeccoGroupITA , discussing some early.

11

3

123

tensorqt

@tensorqt

8 days

if you have overthinking tendencies, try a combat sport.

1

0

9

tensorqt

@tensorqt

8 days

this is SF andrew tate, will not elaborate further.

Roy

@im_roy_lee

8 days

Course live now. $4999 for the next 24 hours. cluely . university

12

1

289

tensorqt

@tensorqt

9 days

RT @dejavucoder: my blogpost "can LLMs dream of electric sheep" is up now! it's a fun experiment where i ask LLMs for a creative visual pro….

0

11

0