1

What is the concept of mini-batch when we are sending one image to FCN for semantic segmentation?

The default value in data layers is batch_size: 1. That means every forward and backward pass, one image is sent to the network. So what will be the mini-batch size? Is it the number of pixels in an image?

The other question is what if we send few images together to the net? Does it affect the convergence? In some papers, I see the number of 20 images.

Thanks

S.EB
  • 1,966
  • 4
  • 29
  • 54

1 Answers1

3

The batch size is the number of images sent through the network in a single training operation. The gradient will be calculated for all the sample in one swoop, resulting in large performance gains through parallelism, when training on a graphics card or cpu cluster.

The batch sizes has multiple effects on training. First it provides more stable gradient updates by averaging the gradient in the batch. This can both be beneficial and detrimental. In my experience it was more beneficial then detrimental, but others have reported other results.

To exploit parallelism the batch size is mostly a power of 2. So either 8, 16, 32, 64 or 128. Finally the batch size is limited by VRAM in the graphics card. The card needs to store all the images and results in all the nodes of the graph and additionally all the gradients.

This can blow up very fast. In this case you need to reduce the batch size or the network size.

Thomas Pinetz
  • 6,948
  • 2
  • 27
  • 46
  • Thanks for your response, Is there any difference of mini-batch concept between image `segmentation` and `classification`? – S.EB Mar 09 '17 at 13:27
  • Not really. In my opinion it is more important in segmentation, because the intra class differences are often larger. But conceptually there is no difference – Thomas Pinetz Mar 09 '17 at 14:12