0

I have a Dual GPU card named Titan Z. I have Matlab 2016a trying to solve a sparse Ax=b equation set for different 'b' values. Titan Z has two GPUs and 6 GB ram for each gpu

Here is the problem.

  1. If I solve a Ax=b problem on 1 GPU, let's say a 'A' matrix size of 2GB, Matlab copies this matrix to each GPU's vram. GPU-Z reports that each GPU has 2 GB of data and only 1 GPU is working
  2. If I solve two Ax=b problem on 2 GPU, let's say 'A' matrix size of 2GB, Matlab again copies this matrix to each GPU's vram twice. Now GPU-Z reports that each GPU has 4GB of data and two GPU is working simultaneously.
  3. If I try to solve a '4GB' problem simultaneously Nvidia driver crashes due to insufficient Vram. But I can solve it on one GPU. not on 2 GPU simultaneously.

Problem is Matlab copies those matrices twice, when it is not needed, more interestingly, it does this when two GPUs uses exact same 'A' matrix but different 'b' vectors.

How can I solve this problem?

piyushj
  • 1,546
  • 5
  • 21
  • 29
coercion
  • 53
  • 7
  • Could you share your code? Are you using `spmd` or similar? – Edric Jul 27 '16 at 07:54
  • It is pretty simply actually. I'm using gpuArray to allocate and transfer and after that I use the parfor loop (i=1:2) to solve the equation set. on GPU. – coercion Jul 27 '16 at 12:38
  • So you are not building the `gpuArray` inside the `parfor` loop? What happens if you do build the `gpuArray`s inside `parfor`? – Edric Jul 28 '16 at 07:27
  • I also tried building it inside. I even called gpuDevice inside make sure it goes to correct GPU. I tried many other things. I'm suspecting that Nvidia driver mirrors any data goes to one GPU since this is a dual GPU card. Next step I will plug in another GPU to see if it copies it to the third one too. Again I must remind you that both GPU's must have same matrix but different vector `b` – coercion Jul 28 '16 at 10:27

1 Answers1

0

It is a bit late reply to my own question but here is the solution. Disabling multi-GPU mode in Nvidia Control Panel solved the problem. Basically it disables the sli and gpus can run independently. It was simple as that.

coercion
  • 53
  • 7