I'm looking for a fast implementation of scan(prefixsum) in OpenCL. The best thing that I found is in the Nvidia SDK but it's old(2010). Does anyone know any other implementation of Scan in OpenCL?
Asked
Active
Viewed 599 times
2 Answers
1
There are several open-source implementations of scan operation in OpenCL:
- CLOGS, a library for higher-level operations on top of the OpenCL C++ API.
- Boost.Compute, a C++ GPU Computing Library for OpenCL.
- VexCL, a C++ vector expression template library for OpenCL/CUDA.
- Bolt, a C++ template library optimized for GPUs.
The author of CLOGS wrote a paper comparing performance of scan (and sort) operations in these implementations.

ddemidov
- 1,731
- 13
- 15
0
if your device supports 2.0 then, use builtin operations for that.
https://stackoverflow.com/a/32394920/4877550
http://developer.amd.com/community/blog/2014/11/17/opencl-2-0-device-enqueue/

Community
- 1
- 1

eclipse0922
- 158
- 2
- 15
-
Built-in operations are limited to one workgroup (a few dozen threads) – Bulat Aug 26 '16 at 07:30