josh_wills Profile Banner
JosH100 Profile
JosH100

@josh_wills

Followers
18K
Following
141K
Media
791
Statuses
20K

Engineering at @datologyai; @duckdb enthusiast, ex-@slackhq

San Francisco, CA
Joined April 2008
Don't wanna be here? Send us removal request.
@josh_wills
JosH100
13 years
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
51
2K
2K
@josh_wills
JosH100
3 days
RT @leavittron: @datologyai and I are at #ICML2025, and we have two best-in-class things at the conference:.1. Swag: fidget spinners.2. Job….
0
7
0
@josh_wills
JosH100
3 days
RT @RicardoMonti9: come work on/lead post-training at @datologyai 🚀🚀.
0
2
0
@josh_wills
JosH100
5 days
RT @code_star: No more llama models. Time to train your own. If you want cracked data hit me up.
0
5
0
@josh_wills
JosH100
17 days
RT @code_star: Me to Claude:. Please push back on my designs. Do not just agree with me. Critic my approach only when appropriate or you di….
0
1
0
@josh_wills
JosH100
18 days
RT @code_star: I can see why math people like frisbee, it’s literally a group in a field passing around a ring.
0
8
0
@josh_wills
JosH100
19 days
RT @code_star: Alternatively, pay @datologyai to look at your data for you. This is what all our cracked scientists look like (from looking….
0
2
0
@josh_wills
JosH100
21 days
RT @code_star: Amazing work (once again). Better midtraining makes models better for RL. Once again the power of good data strikes again.….
0
4
0
@josh_wills
JosH100
25 days
RT @datologyai: 🌞 We're excited to share our "Summer of Data Seminar" series at @datologyai!. We're hosting weekly sessions with brilliant….
0
8
0
@josh_wills
JosH100
27 days
RT @simonw: I like this take by @KentBeck on how AI-assisted programming changes the balance of which skills are most important https://t.c….
0
155
0
@josh_wills
JosH100
28 days
RT @nikparth1: Cannot stress this more. This is also why model comparisons are tricky when it comes to data curation because we all implici….
0
2
0
@josh_wills
JosH100
1 month
RT @arimorcos: This trend will only continue. Training your own model doesn't need to cost 10s of millions, especially in specialized domai….
0
4
0
@josh_wills
JosH100
1 month
RT @arimorcos: Congratulations to our friends and partners @arcee_ai on the release of AFM-4.5B!. With data powered by @datologyai, this mo….
0
11
0
@josh_wills
JosH100
1 month
RT @LucasAtkins7: We teamed up with @datologyai to build what we believe is the strongest pretraining corpus in the world—and I truly think….
0
5
0
@josh_wills
JosH100
1 month
RT @LucasAtkins7: Our customers needed a better base model <10B parameters. We spent the last 5 months building one. I'm delighted to sh….
0
42
0
@josh_wills
JosH100
1 month
RT @alvind319: congrats on the launch!!! data curation powered by @datologyai :)).
0
2
0
@josh_wills
JosH100
1 month
RT @arimorcos: Welcome @VineethDorna!. We're hiring across the board to build the best data engine for AI. Come join us! . .
0
4
0
@josh_wills
JosH100
1 month
RT @davidcrawshaw: Based on feedback from my latest blog post, I am not alone in feeling the pain around code review. Before LLMs, it was a….
0
1
0
@josh_wills
JosH100
1 month
RT @AmplifyPartners: Good data, actually, is all you need.
0
4
0
@josh_wills
JosH100
1 month
RT @leavittron: The team absolutely crushed it here. They blew away nearly every CLIP baseline, and matched or exceeded SigLIP2—which uses….
0
11
0
@josh_wills
JosH100
1 month
RT @lukemerrick_: Although it is no secret that the "secret sauce" underpinning every high-quality model is high-quality training data, I h….
0
4
0