fish buybuy @b2_4814d920 X Profile

fish buybuy

@b2_4814d920

Followers

66

Following

17K

Media

103

Statuses

1K

HPC, RE, slow and meandering exploration 🚨 multi-lang account @METU_ODTU @UofT

Toronto

Joined April 2022

Don't wanna be here? Send us removal request.

fish buybuy

@b2_4814d920

2 days

i can eat precisely 1 hunan dish. getting it for the 20th time this year

0

2

fish buybuy

@b2_4814d920

6 days

so i did get an interview after this and somehow made it to the final round before the thank-you email. ig too much performance anxiety? pun intended. hehe it was a good ride but still. could have aced the coding parts if i didn't reliably freeze up on camera. open to tips...

fish buybuy

@b2_4814d920

2 months

let go of the anthropic challenge at 1383. tbh 1200-something is humanly possible. lower would require demoscene style hacks

1

0

5

fish buybuy

@b2_4814d920

16 days

it gets a bit interesting with multi-node but then stops being interesting again when nvl72 exists

0

1

fish buybuy

@b2_4814d920

16 days

i can't shake off the feeling that writing gpu kernels for transformers is ultimately just busywork. someone designed the gpu for this exact use case. feels weird to write code when there's one vendor-dictated correct answer

1

0

3

Matej Sirovatka

@m_sirovatka

20 days

at this point pretraining researchers can start shooting perf engineers in the head

Gökdeniz Gülmez

@ActuallyIsaak

21 days

Today I’m sharing a new research paper that explores a new idea in mixture of experts architecture called “DynaMoE”. DynaMoE is a Mixture-of-Experts framework where: - the number of active experts per token is dynamic. - the number of all experts can be scheduled differently

12

35

625

fish buybuy

@b2_4814d920

29 days

i think the taalas device can scale to groq or cerebras level. it may lose the speed when the model gets sharded to multiple chips though. maybe a groq with less TDP

0

fish buybuy

@b2_4814d920

1 month

4.6 seems to get this. so it is actually smarter? you can try by asking "what does ssy/bra/sync push or pop in pre-volta". 4.5 says bra "changes PC for some threads" (??)

fish buybuy

@b2_4814d920

3 months

opus 4.5 genuinely cannot trace execution with a reconvergence stack if you omit some details of how it works. new personal benchmark task??

0

1

fish buybuy

@b2_4814d920

1 month

linux would be called "OpenSystemV" if created today

0

1

fish buybuy

@b2_4814d920

1 month

people don't give whimsical names to open source projects anymore could be bc the projects are now less special? most names are taken at this point? or bc grind ethics consumed everything? no idea

1

0

2

fish buybuy

@b2_4814d920

1 month

almost done turning my tensor core notes into a blog post. it's about tensor cores from a designer's POV and why the programming model is so cursed. soon...

0

2

fish buybuy

@b2_4814d920

1 month

my pet theory is that opus actually models the user as some kind of weird tool call / sub agent and that's why it's so naggy about "go to sleep" and "go work" etc.

0

fish buybuy

@b2_4814d920

1 month

ok so it makes sense to have lpddr-based prefill/batch inference machines. or am i dumb. or is it already done

0

2

fish buybuy

@b2_4814d920

2 months

this is true for other generations as well. they just want to make sure you aren't multiplying too many matrices

Cheng

@zcbenz

2 months

Consumer Blackwell hardwares (RTX5090, DGX, etc.) do not support all instructions of Blackwell/Hopper so inference engines would usually select Ampere optimized kernels for them.

0

2

fish buybuy

@b2_4814d920

2 months

@corsix it turns out 11xx -> 10xx is a matter of moving flow ops to alu and loads to flow. it's no special sauce but takes a while to figure out I can totally see how it can go down to 100x if carefully tuned and packed... so the leaderboard needs fixing before everyone ties with 1001

1

0

3

fish buybuy

@b2_4814d920

2 months

here we are after scouring @corsix's posts for hints😅

1

0

1

fish buybuy

@b2_4814d920

2 months

about the "white palace" translation. i'm pleasantly surprised to see that we aren't the only people calling it that way so turkish joins chinese and persian in not accepting "house" as a head of the state's residence... it has to be a palace

0

1