fish buybuy
@b2_4814d920
Followers
66
Following
17K
Media
103
Statuses
1K
HPC, RE, slow and meandering exploration 🚨 multi-lang account @METU_ODTU @UofT
Toronto
Joined April 2022
i can eat precisely 1 hunan dish. getting it for the 20th time this year
0
0
2
so i did get an interview after this and somehow made it to the final round before the thank-you email. ig too much performance anxiety? pun intended. hehe it was a good ride but still. could have aced the coding parts if i didn't reliably freeze up on camera. open to tips...
let go of the anthropic challenge at 1383. tbh 1200-something is humanly possible. lower would require demoscene style hacks
1
0
5
it gets a bit interesting with multi-node but then stops being interesting again when nvl72 exists
0
0
1
i can't shake off the feeling that writing gpu kernels for transformers is ultimately just busywork. someone designed the gpu for this exact use case. feels weird to write code when there's one vendor-dictated correct answer
1
0
3
at this point pretraining researchers can start shooting perf engineers in the head
Today I’m sharing a new research paper that explores a new idea in mixture of experts architecture called “DynaMoE”. DynaMoE is a Mixture-of-Experts framework where: - the number of active experts per token is dynamic. - the number of all experts can be scheduled differently
12
35
625
i think the taalas device can scale to groq or cerebras level. it may lose the speed when the model gets sharded to multiple chips though. maybe a groq with less TDP
0
0
0
4.6 seems to get this. so it is actually smarter? you can try by asking "what does ssy/bra/sync push or pop in pre-volta". 4.5 says bra "changes PC for some threads" (??)
opus 4.5 genuinely cannot trace execution with a reconvergence stack if you omit some details of how it works. new personal benchmark task??
0
0
1
people don't give whimsical names to open source projects anymore could be bc the projects are now less special? most names are taken at this point? or bc grind ethics consumed everything? no idea
1
0
2
almost done turning my tensor core notes into a blog post. it's about tensor cores from a designer's POV and why the programming model is so cursed. soon...
0
0
2
my pet theory is that opus actually models the user as some kind of weird tool call / sub agent and that's why it's so naggy about "go to sleep" and "go work" etc.
0
0
0
ok so it makes sense to have lpddr-based prefill/batch inference machines. or am i dumb. or is it already done
0
0
2
@corsix it turns out 11xx -> 10xx is a matter of moving flow ops to alu and loads to flow. it's no special sauce but takes a while to figure out I can totally see how it can go down to 100x if carefully tuned and packed... so the leaderboard needs fixing before everyone ties with 1001
1
0
3
about the "white palace" translation. i'm pleasantly surprised to see that we aren't the only people calling it that way so turkish joins chinese and persian in not accepting "house" as a head of the state's residence... it has to be a palace
0
0
1