How to split a dataset into an overlapping or non-disjoint train-test using k-fold validation python

Asked Mar 09 '22 at 23:34

Active Mar 09 '22 at 23:34

Viewed 362 times

I am working on a quite different problem where I need to split a dataset into an overlapping or non-disjoint dataset using KFOLD validation in python. I was wondering if there is any way to do that.

asked Mar 09 '22 at 23:34

Shaykh_Python

Have you looked inside the scikit-learn documentation ? You could for example use https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html or https://scikit-learn.org/0.15/modules/generated/sklearn.cross_validation.KFold.html#sklearn.cross_validation.KFold – Baptiste Prevot Mar 09 '22 at 23:42
Yes I did look on these documentations including all the parameters but I don't see any such option that could help me get the data splitted into non-disjoint train-test splits. – Shaykh_Python Mar 09 '22 at 23:52
Ah, my bad @Shaykh_Python, then how about https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html – Baptiste Prevot Mar 10 '22 at 07:36
"Note: contrary to other cross-validation strategies, random splits do not guarantee that all folds will be different, although this is still very likely for sizeable datasets." I have a very large dataset so it is not very likely to produce overlapping train-test splits. Do you know any other way that will guarantee the overlapping split? – Shaykh_Python Mar 11 '22 at 06:59

How to split a dataset into an overlapping or non-disjoint train-test using k-fold validation python

0 Answers0