JosH100
@josh_wills
Followers
18K
Following
148K
Media
798
Statuses
20K
Engineering at @datologyai; ex-@slackhq. I like DataLoaders and @duckdb.
San Francisco, CA
Joined April 2008
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
51
2K
2K
"There are three kinds of lies: lies, damned lies, and database benchmarks." -Mark Twain, probably. MotherDuck just got 19% faster, thanks to improvements in DuckDB 1.4. While these improvements make for a great benchmark, real user queries are where the feathers hit the road.
motherduck.com
Benchmarks, efficiency, and how MotherDuck just got nearly 20% faster.
0
6
20
I have a single friend in his late 30s right now. Mega-millionaire. Doing whatever he wants. He's happy! He always asks me "where are you going next" and I always respond "nowhere, I just want to be home with my kid." And he looks at me like I'm CRAZY. He tries to empathize, but
111
188
5K
. @arimorcos is fascinating to listen to and offers some of the most pragmatic takes I see among startup leaders that are deep in the weeds. He is a realist whose unique mix of candor and clarity leads to a reassuring optimism. If you ever get a chance, have a chat with him.
Hit on a bunch of fun topics in our second "AI Roundup" episode with @_RobToews (AI investor at Radical) and @arimorcos (Co-Founder/CEO of Datology): - Reactions to Karpathy's claim on agents being overhyped - Are we in a bubble? - Do you need to be in SF to build an AI
1
3
27
sports betting is a societal cancer and will slowly ruin everything we hold dear https://t.co/qt2HTg9Tx1
wsj.com
An NBA coach and player are arrested in a federal gambling investigation. A betting-saturated public asks: Well, what did anyone expect?
2
2
28
We made @ApacheParquet metadata parsing 3x-9x faster in the latest release of the Rust implementation https://t.co/x2VK9rJoJ7
1
21
188
If you were impacted by the recent Meta layoffs (or even if you weren't) and you're interested in doing ambitious, rigorous science and/or engineering that powers a real product that actual customers pay us ca$h money for, please DM me or head over to https://t.co/rckCzLScWz.
datologyai.com
Browse jobs and apply to be part of a world-class team solving the frontier research problem that sets the best AI models apart from the rest.
0
10
21
it's not vibecoding anymore. you should spend a good chunk of time writing a software spec (1h+) and maybe some tests, have codex implement it, and then thoroughly review it. it's just a change in how you probably ought to be doing software. TDD is back with a vengeance
1
3
23
I'm teaming up with @samuel_colvin to launch a new meetup series, Py AI. Our first one is November 11 in San Francisco with talks from @pydantic, @modal, @reductoai and @fastmcp. Hosted by our friends at @pebble_bed. Link to join below!
1
6
20
Doing cool science is cool, but actually productionizing that science in a scalable, deployable way is like watching magicians. Read about all of the challenging technical problems at @datologyai, then come solve them with us!
Deduplicate your petabyte-scale multimodal dataset in an air-gapped cluster using this one weird trick* (Researchers h̶a̶t̶e̶ love him.) *may require 6 or 7 weird tricks
0
3
18
me talking to an intern: have you read the docs? the source code? go spend 5 hours on this problem before talking to me me talking to claude code: hi beautiful, how are you today? is everything okay? i wrote you four pages of documentation, here's a link to additional context,
6
19
371
the folks at @AmplifyPartners have written an excellent blog highlighting the (many, extensive, wide-ranging) engineering accomplishments of @josh_wills and @datologyai team. this is important because the engineering team at datology is so so talented and also because it gives
Deduplicate your petabyte-scale multimodal dataset in an air-gapped cluster using this one weird trick* (Researchers h̶a̶t̶e̶ love him.) *may require 6 or 7 weird tricks
2
3
14
This is probably the best time in history to be an engineer at a bad, non-tech company. Nobody across your business has adjusted their expectations. There are entire eng orgs wrapped in an unspoken internal contract to whisper into Cursor for 30 minutes in the morning and call it
21
14
572
Turns out large scale image/text curation is _really hard_ even for simple things like image deduplication! Learn more about the engineering challenges we're solving at @datologyai from @AmplifyPartners latest blog!
The folks at @AmplifyPartners went deep with my team at @datologyai on the engineering challenges involved in large-scale data curation for training models-- from deduplicating a non-trivial fraction of the internet to orchestrating dozens of experiments and terabytes of data
0
4
12
The folks at @AmplifyPartners went deep with my team at @datologyai on the engineering challenges involved in large-scale data curation for training models-- from deduplicating a non-trivial fraction of the internet to orchestrating dozens of experiments and terabytes of data
amplifypartners.com
Behind the scene's of Datology's incredible data infrastructure, from clever implementation of deduplication to their custom versions of Spark and Flyte.
0
4
13
Deduplicate your petabyte-scale multimodal dataset in an air-gapped cluster using this one weird trick* (Researchers h̶a̶t̶e̶ love him.) *may require 6 or 7 weird tricks
2
12
39
The speed of Qwen releases is just wild. PS: Great talk @natolambert on the state of open models!
1
13
78
If you were recently laid off at Meta Gen AI, my dms are open. Help us build the next frontier of Apache-2.0 models.
4
22
175
There are too many data/ai tools to reasonable evaluate them in depth. My advice is to pick tools semi at random and then make that your whole identity.
0
2
6
.@itunpredictable let me write a blog with references to Rodchenko, Rothko, Cohen-Orr, and Tenenbaum, which will likely only appeal to 10 people, but I'm stoked to find the subset of people on the internet who care about geometric abstraction and geometric deep learning
4
1
12