@leavittron
Matthew Leavitt
4 months
The next 10x in deep learning efficiency gains are going to come from intelligent intervention on training data. But tools for automated data curation at scale didn’t exist—until now. I’m so excited to announce that I’ve co-founded @DatologyAI , with @arimorcos and @hurrycane
11
16
126

Replies

@leavittron
Matthew Leavitt
4 months
There’s massive demand by companies to train their own models. And I’ve seen firsthand the training efficiency and model quality improvements that data curation can unlock. But expertise and tooling for data curation are lacking.
1
2
12
@leavittron
Matthew Leavitt
4 months
Data curation is a frontier research problem. There’s only a handful of scientists in the world with deep expertise. And let’s be real—most scientists can’t build a deployable product that scales effortlessly.
2
0
10
@leavittron
Matthew Leavitt
4 months
That’s why we founded @DatologyAI : the algorithms that power our tools are automatic, modality-agnostic, don’t require labels, and the product scales seamlessly to the largest datasets. These are essential features for realizing the next generation of large deep learning models.
1
0
7
@leavittron
Matthew Leavitt
4 months
Solving data curation for large-scale model training requires groundbreaking science and engineering. It’s a hard problem with tremendous impact. That’s also what makes it fun. I would have been stupid NOT to co-found @datologyai . And there are a lot of smart people who agree:
1
0
7
@leavittron
Matthew Leavitt
4 months
We have an incredible set of institutional investors who believe deeply in us and our mission: @sarahcat21 and @dauber from @AmplifyPartners , @_RobToews from @radicalvcfund , @saranormous , @outsetcap , and @QuietCapital .
3
0
8
@leavittron
Matthew Leavitt
4 months
1
0
9
@leavittron
Matthew Leavitt
4 months
We also have a hell of a team: @arimorcos , @bodan , @JackUrbs , @j_mcgraph , Kerstin Frailey, Fan Pan, @Ning_Catsnail , @pratyushmaini , and @priy2201 .
2
0
10
@leavittron
Matthew Leavitt
4 months
And we’d like to grow that team: if you’re a deep learning scientist that’s passionate about data or an engineer with data expertise, please get in touch! You can learn more about us at
1
0
6
@cwolferesearch
Cameron R. Wolfe, Ph.D.
4 months
@leavittron @datologyai @arimorcos @hurrycane Super cool! I think this is a great idea. We've seen that just deduplicating pretraining data can massively improve data/learning efficiency of LLMs. I can't even begin to think what's possible if you explore more sophisticated approaches for data intervention.
1
1
7
@leavittron
Matthew Leavitt
4 months
@cwolferesearch @arimorcos @datologyai @hurrycane You spend a LOT of time reading and thinking about deep learning research, so your opinion means a lot to us 🙂
1
0
1
@analyticsaurabh
Saurabh Bhatnagar
4 months
1
0
1
@leavittron
Matthew Leavitt
4 months
0
0
0
@yanndubs
Yann Dubois
4 months
1
0
1
@leavittron
Matthew Leavitt
4 months
0
0
0
@koval_alvi
Aleksandr Kovalev
4 months
1
0
0
@leavittron
Matthew Leavitt
4 months
@koval_alvi @datologyai @arimorcos @hurrycane There may be some of that, but it's so much more.
0
0
0
@harrybrundage
Harry Brundage
4 months
0
0
2
@andersonbcdefg
Ben (e/sqlite)
4 months
@leavittron @datologyai @arimorcos @hurrycane whoa congrats! excited to see what you accomplish 💪
0
0
1
@leavittron
Matthew Leavitt
4 months
@desicochrane @datologyai @arimorcos @hurrycane It's allllll software and integrates into your trainer with a only a few loc change 😉
0
0
0