Davis Yoshida Profile
Davis Yoshida

@davis_yoshida

Followers
391
Following
3K
Media
93
Statuses
1K

Seattle, WA
Joined December 2016
Don't wanna be here? Send us removal request.
@davis_yoshida
Davis Yoshida
28 days
Moved to SF for work! If a mutual (you perhaps) wants to get coffee they should DM me.
0
0
7
@davis_yoshida
Davis Yoshida
1 year
Suffix scaling will hit a wall:
en.m.wikipedia.org
@arankomatsuzaki
Aran Komatsuzaki
1 year
GPT series still hasn't shown any signs of saturation 😲
Tweet media one
0
1
6
@grok
Grok
1 day
Join millions who have switched to Grok.
12
12
107
@davis_yoshida
Davis Yoshida
1 year
Just writing some markdown in nvim and copilot attacks me out of nowhere
Tweet media one
0
0
14
@davis_yoshida
Davis Yoshida
1 year
The problem is we need some sort of pareto surface where the axes of interest are:. - Hours needed to learn tool.- Hours spent on specific implementation.- Latency or whatever the metric of interest is. I'm pretty sure the relevant datapoints are impossible to collect.
@cHHillee
Horace He
1 year
@gallabytes I think there’s 2 separate questions. 1. Is it easier to write inefficient Pytorch code than inefficient Jax code?. 2. Is it easier to write efficient Jax code than it is to write efficient Pytorch code?. Imo, the answer to the first one is yes. For example, the huggingface.
0
0
2
@davis_yoshida
Davis Yoshida
1 year
Tweet media one
1
0
11
@davis_yoshida
Davis Yoshida
1 year
LeCun et al., 1990
Tweet media one
0
0
8
@davis_yoshida
Davis Yoshida
1 year
It's simultaneously relieving and frustrating when the bug you spent hours thinking was yours turns out to be a library bug.
0
0
3
@davis_yoshida
Davis Yoshida
1 year
I'll always cherish the brief time I was ahead of neal wu on the advent leaderboard (I assume he was sleepy or something).
0
0
4
@davis_yoshida
Davis Yoshida
1 year
I was going to say it doesn't support vmap yet, but I just tested and it (accidentally) does. Ty for the magic @sharadvikram.
@davis_yoshida
Davis Yoshida
1 year
@hr0nix @epiqueras1 This got me to finally upload this: It's super WIP, but it does a block-sparse forward using Pallas. I still need to add the backwards kernels though.
1
0
6
@davis_yoshida
Davis Yoshida
1 year
ternary parameters are 1-bit?.this 8.5B model is "7B"?. come on guys.
2
0
10
@davis_yoshida
Davis Yoshida
2 years
Inside of you are five wolves
0
0
6
@davis_yoshida
Davis Yoshida
2 years
It's a good thing JAX wasn't around in the 90s Schmidhuber would have gotten out of control.
@davis_yoshida
Davis Yoshida
2 years
@0xmaddie_ Here's MLPs in MLPs in MLPs in about 40 LoC of JAX + qax. You could swap the MLP for any other Flax module though.
Tweet media one
0
0
3
@davis_yoshida
Davis Yoshida
2 years
Recalling when I peaked.
@davis_yoshida
Davis Yoshida
2 years
I once comprehended conv_general_dilated its entirety for 30 whole milliseconds.
0
0
3
@davis_yoshida
Davis Yoshida
2 years
it turns out that `sign` isn't supported either, but fortunately I can just do x/|x| since I already dropped support for 0.
1
0
3
@davis_yoshida
Davis Yoshida
2 years
POV you are multiplying two numbers in triton
Tweet media one
1
1
6
@davis_yoshida
Davis Yoshida
2 years
when you forget to measure a secondary outcome in your phase three trial.
@8chabard
Lysander, 8 CHA Bard
2 years
the weird thing about having gotten used to my weekly semaglutide injection is now i'm like "huh, you know what? maybe i *could* shoot up heroin".
0
0
2
@davis_yoshida
Davis Yoshida
2 years
RT @alexfmckinney: I am looking / on the market for a new ML Engineer / Researcher remote (EU times) role! 👨‍🔬.You can find my resume here….
0
28
0
@davis_yoshida
Davis Yoshida
2 years
Top 50 on advent this year! 📈
Tweet media one
0
0
9
@davis_yoshida
Davis Yoshida
2 years
When you're the pivotal factor
Tweet media one
Tweet media two
0
0
4
@davis_yoshida
Davis Yoshida
2 years
personal best (part 2 kinda exploded)
Tweet media one
0
0
2