Have a train dataset with multi-class target variable category
train.groupby('category').size()
0 2220
1 4060
2 760
3 1480
4 220
5 440
6 23120
7 1960
8 64840
I would like to get the new validation dataset from the train set by having the percentage from each class (let's say 20%) to avoid missing classes in validation set and spoiling the model. So basically the desirable output would be df with the same structure and info like train set but with parameters like these:
0 444
1 812
2 152
3 296
4 44
5 88
6 4624
7 392
8 12968
Is there any straight-forward approach for solving it in pandas?