DataCebo
@datacebo
Followers
99
Following
25
Media
42
Statuses
79
An MIT spin-off that's making synthetic data a reality.
Joined October 2020
Excellent article in @Forbes today calling #syntheticdata โan all-too-rare example ofโฆgenuinely usefulโ generative AI, for the particular application of software testing. Read @jpwarren profile of @datacebo and @kveeramac : https://t.co/CpWQWShgFE
#bigdata #syntheticdata
0
4
7
SDV Enterprise v0.24.0 is out ๐ This release adds features that help you generate higher quality synthetic data and improve ease-of-use. ๐ Model hierarchical relationships in a table. Use the SelfReferentialHierarchy CAG pattern when you have a column in a table that
0
0
1
Today, weโre excited to introduce a powerful new bundle to the @sdv_dev: AI connectors. AI connectors address 2 key challenges that SDV users face when training generative AI models on datasets from enterprise data stores. (Link to the announcement: https://t.co/rHXe13g1Ru) โ
0
0
1
SDV Enterprise v0.23.0 is out ๐ This release enhances your ability to program your synthesizer to find certain patterns and recreate themโ whether it's through multi-table CAG patterns, single-table constraints, or pre-processing techniques that transform your data. ๐
0
0
1
Today, we are excited to introduce a very powerful new framework to The Synthetic Data Vault : ๐ฐ๐ผ๐ป๐๐๐ฟ๐ฎ๐ถ๐ป๐ ๐ฎ๐๐ด๐บ๐ฒ๐ป๐๐ฒ๐ฑ ๐ด๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ถ๐ผ๐ป (#CAG for short). CAG addresses the shortcomings of generative models in capturing the context buried in enterprise data
1
1
4
Working with customers all over the world has taught us about one important, but often overlooked benefit of using #syntheticdata: increased data diversity. Data diversity refers to the overall variety of data that is accessible for a project. While it's a simple concept,
0
2
3
๐๐ฅ #CTGAN has been downloaded over 2.5 million times.ย ๐ฅ๐ Released #thisweek in 2019: version 0.1.0 of #CTGAN as part of The Synthetic Data Vault, a Deep Learning-based #syntheticdata generator for single-table data that can learn from real data and generate synthetic data
0
1
2
Upon popular demand we have added the ability to connect to databases to bring data to The Synthetic Data Vault (@sdv_dev ). Users can now directly connect #SDV Enterprise to their databases, both to import real data and to export #syntheticdata. We have added #bigquery and
0
0
0
Born #otd in 1950: the Turing Test.ย Alan Turing's paper from 74 years ago describes a modified version of the "imitation game" in which a human judge has to determine which of two typing partners is a computer. June 2024: In related news, one recent study found that human
0
0
2
One of our users exclaimed "These speedups are insane!" Our multi table synthesizer in SDV Enterprise, called HSA Synthesizer, runs in less than 1 minute what takes HMA Synthesizer an hour - across 20 datasets. โ๏ธ We have been focusing on multi table synthesizers.
0
1
5
In 1956, to store 5MB it required a hard disk that weighed a ton. In 2024 a #generativeai model can capture the salient properties of terabytes of data in an entire database within a single file and recreate it on demand - what we call #syntheticdata. #otd in 1956 IBM launched
storagenewsletter.com
IBM Corp. is just celebrating its 100th anniversary as the company was founded on June 16, 1911. Consequently we come back here on one of the greatest innovation in the history of Big Blue and the...
0
0
2
Happy birthday to the late Dennis Ritchie, inventor of C and co-creator of Unix. C and C++ have played a key role in the big data revolution, having been the origin languages for some of the core components of popular ML libraries, including #PyTorch and #TensorFlow. Multics,
0
0
0
#OTD in 2016 we submitted the final camera ready version of the Massachusetts Institute of Technology paper โญ๏ธ The synthetic data vault โญ๏ธ The paper said: "This synthetic data must meet two requirements: 1๏ธโฃ First, it must somewhat resemble the original data statistically, to
0
3
5
Launched 25 years ago this summer: VMware 1.0, the first commercial product that allowed users to run multiple operating systems as virtual machines on a single x86 machine. Later known as VMware Workstation, it was an influential application that provided a framework for cloud
0
1
1