How to do image augmentation on the actual dataset, so that I don't need to add label for every augmented image

Question

I want to do augmentation on a dataset containing images as an np-array stored in X_train and its label stored in y_train. Shapes are as follows:

print(X_train.shape)
print(y_train.shape)

Output:

(1100, 22, 64, 64)
(1100,)

A single image looks like this

plt.imshow(X_train[0][0])

How do I augment this dataset, so that I don't need to add its label every time?

score 0 · Accepted Answer · answered May 31 '22 at 05:10

0

One option is to use a generator:

def get_augmented_sample(X_train, y_train):
  for x, y in zip(X_train, y_train): 
    # data augmentation to x, e.g. adding some noise
    x_augmented = x + np.random.normal(0, 20, x.shape)
    yield x_augmented, y

data_generator = get_augmented_sample(X_train, y_train)

# get an augmented sample 
x, y = next(data_generator)

# original
plt.imshow(X_train[0][0])

# augmented
plt.imshow(x[0])

answered May 31 '22 at 05:10

tripp

128
5

Hi @tripp, it worked out. I just tweaked the code a little bit to my need. Thanks! – Ritesh Prasad Singh Jun 02 '22 at 15:46
@RiteshPrasadSingh glad to hear it! please accept this answer and upvote it if it was helpful – tripp Jun 03 '22 at 05:01

How to do image augmentation on the actual dataset, so that I don't need to add label for every augmented image

1 Answers1