Davis Yoshida @davis_yoshida X Profile

Davis Yoshida

@davis_yoshida

Followers

391

Following

3K

Media

93

Statuses

1K

Seattle, WA

Joined December 2016

Don't wanna be here? Send us removal request.

Davis Yoshida

@davis_yoshida

28 days

Moved to SF for work! If a mutual (you perhaps) wants to get coffee they should DM me.

0

7

Davis Yoshida

@davis_yoshida

1 year

Suffix scaling will hit a wall:

en.m.wikipedia.org

Aran Komatsuzaki

@arankomatsuzaki

1 year

GPT series still hasn't shown any signs of saturation 😲

0

1

6

Grok

@grok

1 day

Join millions who have switched to Grok.

12

107

Davis Yoshida

@davis_yoshida

1 year

Just writing some markdown in nvim and copilot attacks me out of nowhere

0

14

Davis Yoshida

@davis_yoshida

1 year

The problem is we need some sort of pareto surface where the axes of interest are:. - Hours needed to learn tool.- Hours spent on specific implementation.- Latency or whatever the metric of interest is. I'm pretty sure the relevant datapoints are impossible to collect.

Horace He

@cHHillee

1 year

@gallabytes I think there’s 2 separate questions. 1. Is it easier to write inefficient Pytorch code than inefficient Jax code?. 2. Is it easier to write efficient Jax code than it is to write efficient Pytorch code?. Imo, the answer to the first one is yes. For example, the huggingface.

0

2

Davis Yoshida

@davis_yoshida

1 year

1

0

11

Davis Yoshida

@davis_yoshida

1 year

LeCun et al., 1990

0

8

Davis Yoshida

@davis_yoshida

1 year

It's simultaneously relieving and frustrating when the bug you spent hours thinking was yours turns out to be a library bug.

0

3

Davis Yoshida

@davis_yoshida

1 year

I'll always cherish the brief time I was ahead of neal wu on the advent leaderboard (I assume he was sleepy or something).

0

4

Davis Yoshida

@davis_yoshida

1 year

I was going to say it doesn't support vmap yet, but I just tested and it (accidentally) does. Ty for the magic @sharadvikram.

Davis Yoshida

@davis_yoshida

1 year

@hr0nix @epiqueras1 This got me to finally upload this: It's super WIP, but it does a block-sparse forward using Pallas. I still need to add the backwards kernels though.

1

0

6

Davis Yoshida

@davis_yoshida

1 year

ternary parameters are 1-bit?.this 8.5B model is "7B"?. come on guys.

2

0

10

Davis Yoshida

@davis_yoshida

2 years

Inside of you are five wolves

0

6

Davis Yoshida

@davis_yoshida

2 years

It's a good thing JAX wasn't around in the 90s Schmidhuber would have gotten out of control.

Davis Yoshida

@davis_yoshida

2 years

@0xmaddie_ Here's MLPs in MLPs in MLPs in about 40 LoC of JAX + qax. You could swap the MLP for any other Flax module though.

0

3

Davis Yoshida

@davis_yoshida

2 years

Recalling when I peaked.

Davis Yoshida

@davis_yoshida

2 years

I once comprehended conv_general_dilated its entirety for 30 whole milliseconds.

0

3

Davis Yoshida

@davis_yoshida

2 years

it turns out that `sign` isn't supported either, but fortunately I can just do x/|x| since I already dropped support for 0.

1

0

3

Davis Yoshida

@davis_yoshida

2 years

POV you are multiplying two numbers in triton

1

6

Davis Yoshida

@davis_yoshida

2 years

when you forget to measure a secondary outcome in your phase three trial.

Lysander, 8 CHA Bard

@8chabard

2 years

the weird thing about having gotten used to my weekly semaglutide injection is now i'm like "huh, you know what? maybe i *could* shoot up heroin".

0

2

Davis Yoshida

@davis_yoshida

2 years

RT @alexfmckinney: I am looking / on the market for a new ML Engineer / Researcher remote (EU times) role! 👨‍🔬.You can find my resume here….

0

28

0

Davis Yoshida

@davis_yoshida

2 years

Top 50 on advent this year! 📈

0

9

Davis Yoshida

@davis_yoshida

2 years

When you're the pivotal factor

0

4

Davis Yoshida

@davis_yoshida

2 years

personal best (part 2 kinda exploded)

0

2