ArrayFire parallel block sum

Question

What I want to do is this: I am having an "expanded" array in the first (rows) dimension. For example, I have an image of 1080 rows and 1920 columns. This expanded array is (8*1080) rows and 1920 columns, 8 means "row block" size. What I want to do is to make a new array of size 8x1. This new array will hold the sum of every block at the i-th (i=0 to 7).

In the above example, the first element of the new array (i=0) will be the sum of these pixels in the expanded array (linear indices, column wise):

0, 8(because 8 is the FIRST element of the second block), 16 (third block).....

another example is the second element:

1, 9, 17,...

I think this can be parallelized? I am trying to solve this but I am unable to, I tried gfor but could not find a way to do it, is it not possible with arrayfire? any help appreciated!

I have tried using gfor but I could not solve the problem.

Here is some code that I tried: rx is the 8x1 (p_squared_1 = 8) and rx_all is the expanded (p_squared*rows, columns) array. Note I am using the seq "+" operator because if I try to write "i+p_squared_1" there is ambuiguity, I think...this is a mistake on my part, but I could not find another way to add a value to a seq object).

af::array rx(p_squared_1, 1);
gfor(af::seq i, rows*cols*(p_squared_1-1)) {
    rx(i) = af::sum<float>(rx_all(i.operator+( (const int)p_squared_1)));
}
af::eval(rx);
cout << af::sum<float>(rx);

I expect to get a 8x1 array where each i-th element is the sum of the i-th elements of each block in the expanded array.

Umar Arshad · Accepted Answer · 2019-06-25T15:14:59.177

3

I think you can achieve this by performing a af::moddims and a af::sum.

array img_expanded(1080*8, 1920);

array img_expanded_reshaped = moddims(img_expanded, 8, 1920*1080);
array result = sum(img_expanded_reshaped, 1);

The moddims call reshapes the array into an 8x(1920*1080) array then you perform the summation across the second dimension.

Optimized Layout

You could get better performance if you treated the 1920 side as the leading dimension. Not only will this match the layout of the image in CPU memory and avoid doing the transpose on transfers to and from the GPU but the reshaped array will have a larger first dimension so it will have better GPU utilization.

array img_expanded(1920, 1080*8);

array img_expanded_reshaped = moddims(img_expanded, 1920*1080, 8);
array result = sum(img_expanded_reshaped, 0);

This will require you to refactor more than this part of the code.

edited Jun 25 '19 at 15:14

answered Jun 25 '19 at 15:06

Umar Arshad

970
1
9
22

thank you very much, this helped a lot, I didn't thought of doing with moddims, this is a very handy function. – eikonoules Jun 25 '19 at 17:22
one question though, is it possible the above code won't sum with 100% accuracy? The total sum is correct but each of the 8 sums are slightly different so I don't think it's about precision and accuracy (I am using floats but there is no decimal part at all, the numbers are like 1268.0, 650.0 etc) – eikonoules Jun 25 '19 at 20:43
There are always going to be rounding errors with floating point operations. Because the order in which the operations are performed on the GPU are undefined, the values are going to be slightly different. Check the type of the `af::array` that is performing the sum. It may be working on integer values and then they are converted to float. The print function may also be truncating the values. – Umar Arshad Jun 26 '19 at 13:20
ok thanks for the clarification! I have another question regarding the gfor construct, can I make a new post? – eikonoules Jun 27 '19 at 06:13
Its better to handle these sort of questions on our slack channel. https://join.slack.com/t/arrayfire-org/shared_invite/MjI4MjIzMDMzMTczLTE1MDI5ODg4NzYtN2QwNGE3ODA5OQ – Umar Arshad Jun 28 '19 at 01:45

ArrayFire parallel block sum

1 Answers1

Optimized Layout