I am training an autoencoder on MNIST, and noticed that increasing the batch size after 128, starts taking more computation time on a fixed dataset size.
I am using tensorflow-gpu and have GeForce GTX 1070.
I tried running a couple of tests on a fixed training set of 5000 samples (784 dim), and ran for 10 epochs.
The batches are consecutive batch-size
chunks from the 5000 training samples, so the number of iterations effectivaly depends on the batch size.
I tracked the performance on this data (loss), execution time and the GPU memory usage of the python process (from nvidia-smi output):
5000 datapoints 10 epochs
batch size
512: loss: 53.7472; execution took 00:00:13,787; 4281MiB
256: loss: 48.1941; execution took 00:00:04,973; 695MiB
128: loss: 42.7486; execution took 00:00:03,350; 439MiB
64: loss: 40.0781; execution took 00:00:04,191; 439MiB
32: loss: 37.7348; execution took 00:00:06,487; 441MiB
16: loss: 36.6291; execution took 00:00:12,102; 441MiB
8: loss: nan; execution took 00:00:23,115; 441MiB
When I try minibatch sizes larger than 512 I get Out Of Memory errors.
I guess it makes sense for the smaller batches to take longer to execute, as there will be more updates in sequence for the same date. However, I am not sure why the computation time increases when minibatch is larger than 128 samples, instead of decreasing further.
One assumption is it has to do with the GPU getting full and unable to parallelise properly, but I couldn't find any such comments online.