
🦉DVC
@DVCorg
Followers
5K
Following
3K
Media
633
Statuses
2K
Open source tool for data, models, & experiment versioning for ML projects. Join our stellar community https://t.co/RTCIKrZlmf for help, support and insights.
San Francisco, CA
Joined May 2018
"Metadata marts could play a key role in making video data more accessible and structured for model training and analysis" - Simon Thelin (@synthesiaIO, creator of the DataPains blog) reviewed DataChain 👇.
1
3
4
@PyData @JulianWgs ✅ Finally, metrics, parameters are captured and also attached to Git and iterations - to compare, visualize result;.
0
0
0
@PyData @JulianWgs The usual DVC building blocks are utilized to achieve this:. ✅ DVC data versioning makes sure input data is saved and attached to an iteration and can be restored or access anytime in the future;.✅ Lightweight CLI pipelines declaratively describe and run data processing.
1
0
0
@PyData @JulianWgs suggests using DVC to streamline simulation iterations and track results. DVC keeps it lightweight (no need to run servers and such - CLI, Git, basic Python) while making the whole process way more manageable and scalable.
1
0
1
Watch an excellent PyData Berlin talk by @JulianWgs (@VW ) on automating and managing fluid simulations with Python and DVC. 🧵
1
1
3
@photoroom_app @EliotAndres DataChain is an open source library 🔗 (and SaaS platform for collaboration and scale) that implements this idea at scale + our goal was make it easy to use by ML teams.
1
0
0
@photoroom_app @EliotAndres 🐘 Smaller scale - DVC + CSV files, or Postgres + some custom ETL to feed it.
1
0
0
Some examples and ideas we've seen:. 🎥 @photoroom_app - see the link to the talk by @EliotAndres below.🧊 Iceberg for metadata (and sometimes binaries) + Spark - is one of the default choices (but usually requires data engineering skills / team).
1
0
1