0

I am training a UNET for semantic segmentation but I only have 200 labeled images. Given the small size of the dataset, it definitely needs some data augmentation techniques.

I have question about the test and the validation set.

I have custom data generator which keep feeding data from folder for training model.

So what I plan to do is:

  • do data augmentation for the training set and keep all of it in the same folder

  • "randomly" pick some of training data into test and validation set (of course, before training).

I am not sure if this is fine, since we just do some simple processing (flipping, transposing, adjusting brightness)

Would it be better to separate the data first and do the augmentation for the rest of data in the training folder?

Paul92
  • 8,827
  • 1
  • 23
  • 37
Pro_gram_mer
  • 749
  • 3
  • 7
  • 20
  • Possible duplicate of [Data augmentation in test/validation set?](https://stackoverflow.com/questions/48029542/data-augmentation-in-test-validation-set) – Paul92 Apr 28 '19 at 12:07
  • @Paul92 I think you got me wrong, I am having issue about weather increasing the data first and pick some to test set (before training) or split data first and for example 50 for train and 25 for test and then do augmentation. – Pro_gram_mer Apr 28 '19 at 13:20
  • That is exactly the point of that question. As the first answer indicates, it's better if you keep some examples for validation and perform augmentation only on the training data. You don't need to perform augmentation on the validation data. – Paul92 Apr 28 '19 at 14:11
  • @Paul92 Thanks,but still need help ...I don’t get it,For example ,I have image and I do three types of flip and choose two of them aas training set and one rest is for test set ,Since the one in test will never be seen during training. Why can’t I do it? – Pro_gram_mer Apr 28 '19 at 16:15
  • I think you got it wrong. Let's say you have 10 images. You keep 2 for testing. The rest of 8, you flip/translate/rotate and get, let's say, 80 images after this augmentation. You train the network using these 80 and test is using the 2 you kept initially. – Paul92 Apr 28 '19 at 18:46
  • You can't split after the augmentation because your network might learn to identify the flip/rotation/translation, not the actual thing in the image. And you want your testing data to be as similar as possible to the real data. If your network can't recognize one image, it might not be able to recognize its translation too. So, testing it with the translations of the same image is not really relevant for the performance of the network with real data. – Paul92 Apr 28 '19 at 18:48
  • @Paul92 OK,it's clear to me right now ,Thank you so much! – Pro_gram_mer Apr 29 '19 at 02:26

0 Answers0