Is there a way that we can limit the Memory Allocation used by this Model to allow for Concurrent Models to Run?
I'm currently using InsightFace which is built on MXNet.
After loading the first model the GPU Mem stats reflect:
utilization.gpu 74 utilization.memory 0 memory.free 13353 memory.used 2777 memory.total 16130
After running the first inference through, it balloons, but the GPU Utilization
is still very low at 3
:
utilization.gpu 3 utilization.memory 0 memory.free 9789 memory.used 6341 memory.total 16130
This makes me think that we should be able to load more models onto the same GPU, but unfortunately the memory is already allocated to MXNet.
Solutions Tried:
- Trying
ctx.empty_cache()
between calls to the model - https://mxnet.apache.org/api/python/docs/api/mxnet/context/index.html#mxnet.context.Context.empty_cache - Trying
MXNET_GPU_MEM_POOL_RESERVEļ¼60
- https://discuss.mxnet.io/t/how-to-limit-gpu-memory-usage/6304/3 - Using
gc.collect()
Reset GPU memory using Keras 1.2.2 with MXnet backend
But none of these worked.