How do I split with keeping EMNIST datasets balanced

Question

Here is my current code.

affine = transforms.RandomAffine([-15, 15], scale=(0.8, 1.2))  # 回転とリサイズ
normalize = transforms.Normalize((0.0, 0.0, 0.0), (1.0, 1.0, 1.0))  # 平均値を0、標準偏差を1に

to_tensor = transforms.ToTensor()

transform_hyouji=transforms.Compose([to_tensor,affine])

emnist_data = EMNIST(root='./EMNIST_1st', split=splits[-2], 
                       train=False,download=True,
                       transform=torchvision.transforms.ToTensor())

splits = ('byclass', 'bymerge', 'balanced', 'letters', 'digits', 'mnist')

EMNIST_train = EMNIST(root='./EMNIST_1st', split=splits[-2], 
                       train=True,download=True,
                       transform=torchvision.transforms.ToTensor())

EMNIST_test = EMNIST(root='./EMNIST_1st', split=splits[-2], 
                       train=False,download=True,
                       transform=torchvision.transforms.ToTensor())

However, I think the size of this dataset are

EMNIST_train.__len__(), EMNIST_test.__len__()
#(240000, 40000)

I would like to change this size with keeping balanced size, i.e. the almost same rate for each label.

I think this question(How do you alter the size of a Pytorch Dataset?) is helpful but I think this is not keeping the label balanced.

If you tell me how to , I would appreciate.

Could you elaborate more about the meaning of balanced size? It would be helpful if you gave an example of balanced size — Anwarvic, Oct 01 '21 at 00:49
Do you want to "cut down" your dataset such that it contains the same number of instances per class. **Or** use a sampler that draws from the dataset in a balanced way w.r.t each classes? — Ivan, Oct 01 '21 at 17:27

How do I split with keeping EMNIST datasets balanced

0 Answers0