PyOpenCL, array filter: copy_if vs my own atomic-based implementation

Question

I have an array of random integers. For example [132, 2, 31, 49, 15, 6, 70, 18 ... , 99, 1001]. I want to produce array of all numbers that greater than 100 for example and get size of that array.

There are two ways:

New feature of PyOpenCL copy_if. It's based on GenericScanKernel and if we go deeper on Prefix Sums.
Pure OpenCL solution that used Atomics

Does copy_if always works properly? As I can see copy_if doesn't use atomic. Is it possible to faced with trouble using copy_if?

What about performance of copy_if compared to atomic way?

What would you choose and why?

The two will both work, but in the atomic approach, parallelization will have been quite nullified. The whole point of prefix sums is to keep the GPU pumping at full speed without resorting to slow atomics. In general, having to use atomics indicates you are either using the wrong tool for the job (not a good problem for parallel computing) or the wrong algorithm for the job (probably the case here). — Thomas, Jan 29 '13 at 19:11
But may be overhead of parallelization is too big. Do you know some benchmarks that compares that way and atomic way? — petRUShka, Jan 29 '13 at 20:12
How big is your dataset? For data sets small enough that you would even be able to measure the overhead, you're probably better off doing the calculation in native Python. Also, since you have implementations of both methods, why not benchmark them yourself? — Thomas, Jan 29 '13 at 20:25
Dataset is quite big. The reason is that I don't want to implement both of them :) But if I don't find any benchmarks I will make it by myself. — petRUShka, Jan 29 '13 at 20:40

linhares · Answer 1 · 2015-05-15T05:05:00.747

I have never seen an error with copy_if. Always the same results; it seems very robust. (I haven't built unit tests, though.)

As for performance, copy_if should be much faster, especially if your GPU is fast. As others have said, atomics and GPUs are a bad combination (I have suffered too much to learn this...)

And if the number of expected results is small in relation to your dataset, I have proposed a sparse_copy_if() method here---where you can also find a copy_if example.

Fork my code and the following should work:

from my_pyopencl_algorithm import copy_if 
final_gpu, evt = my_pyopencl_algorithm.sparse_copy_if(array_gpu, "ary[i] > 100", queue = queue)

PyOpenCL, array filter: copy_if vs my own atomic-based implementation

1 Answers1