I am working on a quite different problem where I need to split a dataset into an overlapping or non-disjoint dataset using KFOLD validation in python. I was wondering if there is any way to do that.
How to split a dataset into an overlapping or non-disjoint train-test using k-fold validation python
Asked
Active
Viewed 362 times
0
-
Have you looked inside the scikit-learn documentation ? You could for example use https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html or https://scikit-learn.org/0.15/modules/generated/sklearn.cross_validation.KFold.html#sklearn.cross_validation.KFold – Baptiste Prevot Mar 09 '22 at 23:42
-
Yes I did look on these documentations including all the parameters but I don't see any such option that could help me get the data splitted into non-disjoint train-test splits. – Shaykh_Python Mar 09 '22 at 23:52
-
Ah, my bad @Shaykh_Python, then how about https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html – Baptiste Prevot Mar 10 '22 at 07:36
-
"Note: contrary to other cross-validation strategies, random splits do not guarantee that all folds will be different, although this is still very likely for sizeable datasets." I have a very large dataset so it is not very likely to produce overlapping train-test splits. Do you know any other way that will guarantee the overlapping split? – Shaykh_Python Mar 11 '22 at 06:59