I made a very simple network using mxnet(two fc layers with dim of 512).
By changing the ctx = mx.cpu() or ctx = mx.gpu(0), I run the same code on both CPU and GPU.
The memory cost of GPU is much bigger than CPU version.(I checked that using 'top' instead of 'nvidia-smi').
It seems strange, as the GPU version also has memory on GPU already, why GPU still need more space on memory?
(line 1 is CPU program / line 2 is GPU program)