3

Is there a way that we can limit the Memory Allocation used by this Model to allow for Concurrent Models to Run?

I'm currently using InsightFace which is built on MXNet.

After loading the first model the GPU Mem stats reflect:

utilization.gpu 74 utilization.memory 0 memory.free 13353 memory.used 2777 memory.total 16130

After running the first inference through, it balloons, but the GPU Utilization is still very low at 3:

utilization.gpu 3 utilization.memory 0 memory.free 9789 memory.used 6341 memory.total 16130

This makes me think that we should be able to load more models onto the same GPU, but unfortunately the memory is already allocated to MXNet.


Solutions Tried:

  1. Trying ctx.empty_cache() between calls to the model - https://mxnet.apache.org/api/python/docs/api/mxnet/context/index.html#mxnet.context.Context.empty_cache
  2. Trying MXNET_GPU_MEM_POOL_RESERVE:60 - https://discuss.mxnet.io/t/how-to-limit-gpu-memory-usage/6304/3
  3. Using gc.collect() Reset GPU memory using Keras 1.2.2 with MXnet backend

But none of these worked.

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
njho
  • 2,023
  • 4
  • 22
  • 37

1 Answers1

1

By looking at the Environment Variables of MXNet, it appears that the answer is no.


You can try setting MXNET_MEMORY_OPT=1 and MXNET_BACKWARD_DO_MIRROR=1, which are documented in the "Memory Optimizations" section of the link I shared.

Also, make sure that min(MXNET_EXEC_NUM_TEMP, MXNET_GPU_WORKER_NTHREADS) = 1 is true for you, which should be if you used the default values of this environment variables.

gsamaras
  • 71,951
  • 46
  • 188
  • 305