cudaMallocPitch is failed while multi GPUs are controlled by separated CPU processes despite the fact that enough memory is exist

Question

I'm getting 'out of memory' error while using cudaMallocPitch API with GeForce GTX 1080 TI and\or GeForce GTX 1080 GPUs which are part of an entire PC server that include 4 GPUs (1 1080 TI and 3 1080) and two CPUs.

Each GPU is controlled by a dedicated CPU thread which calls to cudaSetDevice with the right device index at the begining of its running.

Based on a configuration file information the application know how much CPU threads shall be created.

I can also run my application several times as a separated processes that each one will control different GPU.

I'm using OpenCV version 3.2 in order to perform an image Background Subtraction.

First, you shall create the BackgroundSubtractorMOG2 object by using this method: cv::cuda::createBackgroundSubtractorMOG2 and after that you shall call its apply method.

The first time apply method is called all required memory is alocated once.

My image size is 10000 cols and 7096 rows. Each pixel is 1B (Grayscale).

When I run my application as a one process which have several threads (each one for each GPU) everything works fine but when I run it 4 times as a separated processes (each one for each GPU) the OpenCV apply function start to fail due to cudaMallocPitch 'not enough memory' failure.

For all GPUs i was verified that I have enough available memory before apply was activated for the first time. For the 1080 it is reported that I have ~5.5GB and for the the 1080 TI I have ~8.3GB and the requested size is: width - 120000bytes, Height - 21288bytes - ~2.4GB.

Please advise.

This question is my real problem. As part of its investigation I was tried to reproduce it with more simple logic which is represented by the previous question. By a mistake I didn't see that it was totally OK. So after I was realized that my test was OK I was decided to ask about my real problem. — OronG, Sep 30 '17 at 17:05
So you would like us to help you debug your words? How could anyone possibly say what might be going wrong without seeing the code in question? — talonmies, Oct 02 '17 at 03:44
No! From my point of view all required information was delivered by me. — OronG, Oct 02 '17 at 09:54
There is a part of code that was provided by me: createBackgroundSubtractorMOG2 and OpenCV is open source.I also gave my frame size and described my SW architecture in high level description. My opinion was that this level of information is enough if not I will be glad to provide more. — OronG, Oct 02 '17 at 10:04
I can add the values which sent to the createBackgroundSubtractorMOG2 : History - 20, VarThreshold - 16, DetectShadows - true Thanks for your help — OronG, Oct 02 '17 at 10:06
OK, I understand that my question was edited very bad and I really want to fix it and to learn from my mistakes for next time that I will ask a new question or provide answer for someone else. I'm new here and I'm learning very carefully all the rules. For example, can you refer me please to the correct place which explain how to edit an exist question or totally change it? Thanks — OronG, Oct 02 '17 at 18:48

score 1 · Answer 1 · edited Mar 31 '19 at 17:52

The problem source was found:

cudaMallocPitch API returned value was cudaErrorMemoryAllocation due to the fact that there wasn’t available OS virtual memory which used by the OS when the process performs read\write accesses to the GPU physical memory.

Because of that, the CUDA driver fails any kind of GPU physical memory allocation.

The complexity here was to figured out why this API is failed while enough GPU physical memory is exist (checked by cudaMemGetInfo API).

I started to analyze two points:

Why I don’t have enough virtual memory in my PC? By performing the following link instructions I changed its size and the problem was disappeared: https://www.online-tech-tips.com/computer-tips/simple-ways-to-increase-your-computers-performace-configuring-the-paging-file/
Why my process consume a lot of OS virtual memory? In the past I figured it out that in order to have a better performance during processing time I shall allocate all required GPU physical memory only once at the beginning because an allocation operation takes a lot of time depends on the required memory size. Due to the fact that I’m working with a frame resolution of ~70Mbytes and my processing logics required a huge amount of auxiliary buffers, a massive GPU and CPU memory areas were required to be allocated which empty the OS virtual memory available areas.

cudaMallocPitch is failed while multi GPUs are controlled by separated CPU processes despite the fact that enough memory is exist

1 Answers1