I have a data set and want to work with an external ML team however because of NDA, I can not share the original or anonymized dataset with them.
Currently I'm using make_classificatio
to create a synthetic data however this is a little time consuming to first understand the statistics of original dataset and then create a synthetic dataset similar to the original one. For dataset example you may consider Iris or other public datasets,
from sklearn import datasets
iris = datasets.load_iris()
I'm wondering if you know any better way to imitate the original dataset?