0

In OpenCL is it possible that a system consisting of multiple GPU's implicitly divide the job without programmer explicitly dividing the work load?
For eg say I have a GPU consisting of 1 SM 192 core GPU and run a matrix multiplication , which works normaly. Now I add another same GPU, and the OpenCL uses both the GPU to calculate the matrix multiplication on its rather than the programmer splitting up the work load to each GPU's.

pradyot
  • 174
  • 12

2 Answers2

1

I don't think OpenCL can automatically do that (at least in 1.2) but there are some OpenCL wrappers which can automatically handle multiple compute device. I have not used OpenCL CodeBench but they claim they have load balancing of multiple compute device.

Mandar
  • 1,006
  • 11
  • 28
  • So in that will the two GPU card each with 1 SM, the system identifies a total of 2 SM present in the system? – pradyot May 14 '16 at 04:38
0

You can unify only memories of devices and do this only with version 2.0 of opencl and upwards.

Kernels are enqueued in command queues and they are created with(and bound to) a single device. So they can work only on single device. But multiple command queues can serve in a common context which can take advantage of implicit buffer synchronization.

Splitting a work can't be done implicitly since it can't be known in runtime that which workitem access which memory address.

Once you write a working single-device wrapper, adding multi gpu support isn't much of a hassle.

huseyin tugrul buyukisik
  • 11,469
  • 4
  • 45
  • 97