Is it possible to reuse GPU's memory when training a network with MxNet?

Question

Is it possible to reuse GPU's memory when training a network? I am following the official instructions to build an SSD (https://gluon-cv.mxnet.io/build/examples_detection/train_ssd_voc.html#sphx-glr-build-examples-detection-train-ssd-voc-py) When I try to train on GPU. I find that the batch size is limit by the video memory. There are guidelines about how to use many GPUs (http://zh.gluon.ai.s3-website-us-west-2.amazonaws.com/chapter_computational-performance/multiple-gpus.html). Obviously, if I have enough money, I certainly have many GPUs. But, if I have a cheap GPU with a little memory, I will never use big batch sizes. The problem associated with the small batch is that the training process may never converge. Note that the parameters in a neural network are not using at the same time. We can move the in-use parameters to GPU and move others out. This idea is common because we reuse memory when we play games. No game will put all the figures into the GPU at the same time. I suppose that this strategy will slow down the GPU, but it should be faster than using CPU alone. Furthermore, the big batch size can be used.

score 0 · Answer 1 · answered Aug 07 '18 at 06:08

So what you are asking is basically "can we move parts of the activations back to the RAM in order to compute the remaining samples in the batch?"

If so, the answer is "probably yes, but at the cost of a lot of speed", since the copy from RAM to GPU is very expensive.

The reason is that you would then also have to copy the batch back once again to do the backpropagation (at least what I am assuming about the internals of backpropagation on GPUs). That would mean that it could be faster to compute your batch simply on a CPU, since that might not be much slower anyways, as you are saving the costly copy operations. Also, SGD with a smaller batch size can be actually beneficial to your convergence, so I don't see why you would argue the opposite (although, with NN, you never quite know and it might just depend on your task...).

As you would expect, there is a technology called “memonger” which should save memory. Another professor has suggested it in https://discuss.mxnet.io/t/is-it-possible-to-reuse-gpus-memory-when-training-a-network/1586. Unfortunately, it has no effect on my computer. — Blue Bird, Aug 10 '18 at 15:08

Is it possible to reuse GPU's memory when training a network with MxNet?

1 Answers1