I would like to split the data set into test and train dataset in the ratio 20:80. However, while splitting, I do not want to split in a manner that 1 S_Id value has few data points in train and other data points in test.
I have a dataset as:
S_Id Datetime Item
1 29-06-2018 03:23:00 654
1 29-06-2018 04:01:00 452
1 29-06-2018 04:25:00 101
2 30-06-2018 05:17:00 088
2 30-06-2018 05:43:00 131
3 30-06-2018 10:36:00 013
3 30-06-2018 11:19:00 092
I would like to split neatly as something like: Train:
S_Id Datetime Item
1 29-06-2018 03:23:00 654
1 29-06-2018 04:01:00 452
1 29-06-2018 04:25:00 101
2 30-06-2018 05:17:00 088
2 30-06-2018 05:43:00 131
Test:
S_Id Datetime Item
3 30-06-2018 10:36:00 013
3 30-06-2018 11:19:00 092
All same S_Ids must be in one set. Can it be done through simple 'groupby'?
Thank you for your help!