0

I have a data set and want to work with an external ML team however because of NDA, I can not share the original or anonymized dataset with them.

Currently I'm using make_classificatio to create a synthetic data however this is a little time consuming to first understand the statistics of original dataset and then create a synthetic dataset similar to the original one. For dataset example you may consider Iris or other public datasets,

from sklearn import datasets
iris = datasets.load_iris()

I'm wondering if you know any better way to imitate the original dataset?

Phoenix
  • 359
  • 7
  • 15
  • difficult to help you if you cannot share the dataset to give an example, looks like we're stuck in a paradox :p – mozway Jun 11 '22 at 17:18
  • say Iris dataset (or any other public dataset) `iris = datasets.load_iris()` – Phoenix Jun 11 '22 at 17:20

0 Answers0