Pandas find the percentage of overlap and split to train test

Asked Jul 01 '19 at 09:14

Active Jul 01 '19 at 09:14

Viewed 193 times

I am running a ML experiment in python and I am stuck with data that have overlaps. I am having a dataframe with multiple columns and the rows between entries are to a big extent similar to subsequent rows.

Are there pandas functions that can split my data frame to two sets trying to reduce the overlaps between the two sets, in a sense that the overall overlaps between the two sets will be as small as possible?

Unfortunately I can not share the dataset but if you can pinpoint me to relevant functions that will be enough for me to continue searching and reading.

I would like to thank you in advance for your reply Regards Alex

asked Jul 01 '19 at 09:14

Alex P

you can do train_test_split with shuffle param as true but it would not give any guarantee of similarity between train and test set – tawab_shakeel Jul 01 '19 at 09:20
Try to provide details and some sample desired output – tawab_shakeel Jul 01 '19 at 09:20

Pandas find the percentage of overlap and split to train test

0 Answers0