b2_4814d920 Profile Banner
fish buybuy Profile
fish buybuy

@b2_4814d920

Followers
66
Following
17K
Media
103
Statuses
1K

HPC, RE, slow and meandering exploration 🚨 multi-lang account @METU_ODTU @UofT

Toronto
Joined April 2022
Don't wanna be here? Send us removal request.
@b2_4814d920
fish buybuy
2 days
i can eat precisely 1 hunan dish. getting it for the 20th time this year
0
0
2
@b2_4814d920
fish buybuy
6 days
so i did get an interview after this and somehow made it to the final round before the thank-you email. ig too much performance anxiety? pun intended. hehe it was a good ride but still. could have aced the coding parts if i didn't reliably freeze up on camera. open to tips...
@b2_4814d920
fish buybuy
2 months
let go of the anthropic challenge at 1383. tbh 1200-something is humanly possible. lower would require demoscene style hacks
1
0
5
@b2_4814d920
fish buybuy
16 days
it gets a bit interesting with multi-node but then stops being interesting again when nvl72 exists
0
0
1
@b2_4814d920
fish buybuy
16 days
i can't shake off the feeling that writing gpu kernels for transformers is ultimately just busywork. someone designed the gpu for this exact use case. feels weird to write code when there's one vendor-dictated correct answer
1
0
3
@m_sirovatka
Matej Sirovatka
20 days
at this point pretraining researchers can start shooting perf engineers in the head
@ActuallyIsaak
Gökdeniz Gülmez
21 days
Today I’m sharing a new research paper that explores a new idea in mixture of experts architecture called “DynaMoE”. DynaMoE is a Mixture-of-Experts framework where: - the number of active experts per token is dynamic. - the number of all experts can be scheduled differently
12
35
625
@b2_4814d920
fish buybuy
29 days
i think the taalas device can scale to groq or cerebras level. it may lose the speed when the model gets sharded to multiple chips though. maybe a groq with less TDP
0
0
0
@b2_4814d920
fish buybuy
1 month
4.6 seems to get this. so it is actually smarter? you can try by asking "what does ssy/bra/sync push or pop in pre-volta". 4.5 says bra "changes PC for some threads" (??)
@b2_4814d920
fish buybuy
3 months
opus 4.5 genuinely cannot trace execution with a reconvergence stack if you omit some details of how it works. new personal benchmark task??
0
0
1
@b2_4814d920
fish buybuy
1 month
linux would be called "OpenSystemV" if created today
0
0
1
@b2_4814d920
fish buybuy
1 month
people don't give whimsical names to open source projects anymore could be bc the projects are now less special? most names are taken at this point? or bc grind ethics consumed everything? no idea
1
0
2
@b2_4814d920
fish buybuy
1 month
almost done turning my tensor core notes into a blog post. it's about tensor cores from a designer's POV and why the programming model is so cursed. soon...
0
0
2
@b2_4814d920
fish buybuy
1 month
my pet theory is that opus actually models the user as some kind of weird tool call / sub agent and that's why it's so naggy about "go to sleep" and "go work" etc.
0
0
0
@b2_4814d920
fish buybuy
1 month
ok so it makes sense to have lpddr-based prefill/batch inference machines. or am i dumb. or is it already done
0
0
2
@b2_4814d920
fish buybuy
2 months
this is true for other generations as well. they just want to make sure you aren't multiplying too many matrices
@zcbenz
Cheng
2 months
Consumer Blackwell hardwares (RTX5090, DGX, etc.) do not support all instructions of Blackwell/Hopper so inference engines would usually select Ampere optimized kernels for them.
0
0
2
@b2_4814d920
fish buybuy
2 months
@corsix it turns out 11xx -> 10xx is a matter of moving flow ops to alu and loads to flow. it's no special sauce but takes a while to figure out I can totally see how it can go down to 100x if carefully tuned and packed... so the leaderboard needs fixing before everyone ties with 1001
1
0
3
@b2_4814d920
fish buybuy
2 months
here we are after scouring @corsix's posts for hints😅
1
0
1
@b2_4814d920
fish buybuy
2 months
about the "white palace" translation. i'm pleasantly surprised to see that we aren't the only people calling it that way so turkish joins chinese and persian in not accepting "house" as a head of the state's residence... it has to be a palace
0
0
1