0

I'm training a U-Net CNN in Keras and one of the image classes is significantly under-represented in the training dataset. I'm using a class weighted loss function to account for this, but my worry is that with such a low batch size, and low class instance, only 1 in 10 batches are likely to include an image of this class. So even though the class is weighted, the network rarely sees it during training. Therefore, would it be bad practice to force the data generator to include at least one instance of this class while its selecting random pieces of data for the batch? I could then avoid a situation where the majority of training is unable to access a class of data that's vital to overall task accuracy.

Tom Halmos
  • 103
  • 5
  • Are you using a text file containing a series of commands which are executed by the command interpreter on MS-DOS, IBM OS/2, or Microsoft Windows systems. Please read the summary for each of your links and verify that they are pertinent to your question before spamming other people with them. [[tag:batch-file]] tag removed. – Compo Jul 15 '20 at 11:59
  • nope. definitely not. – Asif Mohammed Jul 15 '20 at 12:07
  • Yes thank you, it's clear the tag was a mistake, apologies... – Tom Halmos Jul 15 '20 at 12:15
  • nope. definitely not. ( if your dataset is balanced). since you are saying your dataset is underbalanced, i would suggest to augment the data for underbalanced classes – Asif Mohammed Jul 15 '20 at 12:16

1 Answers1

0

I would recommend three possible techniques to handle this kind of problem :

  • Uniformize the probability to get an image of a given class : for example this for Pytorch (don't know which technology you are using, please provide it). (Easy, but least efficient)
  • Adapt the loss, by giving more weight to underbalanced classes (also easy, will give the same result as previous method, consider the easiest-to-implement method of both first)
  • Do some data augmentation (harder, but nowadays a lot of libraries provide efficient ways to do this)

EDIT : Sorry, did not see for Keras. A few useful links: for data augmentation, class balancing and loss adaptation

Doe Jowns
  • 184
  • 1
  • 3
  • 12