
Davis Yoshida
@davis_yoshida
Followers
391
Following
3K
Media
93
Statuses
1K
The problem is we need some sort of pareto surface where the axes of interest are:. - Hours needed to learn tool.- Hours spent on specific implementation.- Latency or whatever the metric of interest is. I'm pretty sure the relevant datapoints are impossible to collect.
@gallabytes I think there’s 2 separate questions. 1. Is it easier to write inefficient Pytorch code than inefficient Jax code?. 2. Is it easier to write efficient Jax code than it is to write efficient Pytorch code?. Imo, the answer to the first one is yes. For example, the huggingface.
0
0
2
I was going to say it doesn't support vmap yet, but I just tested and it (accidentally) does. Ty for the magic @sharadvikram.
@hr0nix @epiqueras1 This got me to finally upload this: It's super WIP, but it does a block-sparse forward using Pallas. I still need to add the backwards kernels though.
1
0
6
It's a good thing JAX wasn't around in the 90s Schmidhuber would have gotten out of control.
@0xmaddie_ Here's MLPs in MLPs in MLPs in about 40 LoC of JAX + qax. You could swap the MLP for any other Flax module though.
0
0
3
RT @alexfmckinney: I am looking / on the market for a new ML Engineer / Researcher remote (EU times) role! 👨🔬.You can find my resume here….
0
28
0