Parallel Matrix Multiplication using multi GPU

Question

I have installed two GPUs (2x Nvidia Quadro 410) in my system in different pci slots. To solve Martix multiplication on both of these GPU, how can I split the input matrices such that each GPU processes/computes a part of output matrix and then returns it back. For eg. for two matrix A, B each of order 10x10 , then the to compute the output matrix C= A x B ,such that ,out of 100 elements(10 x 10) 50 elements should be calculated on 1st GPU and other half i.e 50 to b computed in 2nd GPU. I am trying to implement it on OpenCL. But, any algorithm is welcomed which will help me come up with the solution.

score 2 · Accepted Answer · answered May 05 '16 at 05:07

2

In general, if you have matrices X (of size axb, rows first) and Y (of size bxc),

X * Y = vcat(X[0:a/2,0:b] * Y, X[a/2:a,0:b] * Y)

In this pseudocode, vcat is vertical concatenation (putting one matrix on top of each other, e.g. a 4x3 matrix concatenated with 2x3 matrix will produce a 6x3 matrix), : denotes ranges and [] is indexing.

Both arguments to vcat can be computed on different GPUs, and the concatenation can be achieved just by pointing the output to different sub-regions of the output buffer (assuming we have C-ordered arrays). The initial splitting of X can be similarly achieved just by using different sub-regions (since it is split along a row).

answered May 05 '16 at 05:07

fjarri

9,546
39
49

Thank you. I will start implementing on this and let you know about the progress. Also do you think this is the most efficient way to do this? as it is not dividing all the elements by half.. – pradyot May 05 '16 at 09:29
I am not sure I understand what you mean. This division is purely virtual, in reality you just point your matrix multiplication routine at different parts of the array for different GPUs. – fjarri May 05 '16 at 10:19

Parallel Matrix Multiplication using multi GPU

1 Answers1