0

I was looking for a global memory implementation of prefix sum/scan algorithm using CUDA or OpenCL. All the implementation has been done using local memory. Can anyone help me with the algorithm and how I should proceed?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Luniam
  • 463
  • 7
  • 21
  • For CUDA, a fast prefix sum is available in [thrust](http://thrust.github.io/doc/group__prefixsums.html) as well as [cub](http://nvlabs.github.io/cub/structcub_1_1_device_scan.html). It's possible to force cub not to use shared memory for its temp storage, although I'm not sure why you'd want to do that. If you want to write the code yourself (not recommended), you could start [here](http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html). Moving the temp storage from shared to global should be a trivial modification. – Robert Crovella Apr 21 '14 at 18:44
  • I want to do that because my machine does not support local memory. It cannot access variables declared in local memory using __local. So I want to run this using global memory. – Luniam Apr 21 '14 at 18:53
  • @Luniam: So you are working with OpenCL on a platform without local memory? Is there really such a thing? – talonmies Apr 22 '14 at 04:15
  • 2
    AFAIK OpenCL requires support for `__local` memory. Perhaps it is a reference to some HD 4xxx GPUs which "emulate" `__local` memory, and performance may suffer. I don't think you can have an OpenCL compliant platform that actually *does not support* local memory. – Robert Crovella Apr 22 '14 at 05:44
  • I get these errors when I use local memory:In initial.cpp***[14100.684249] Mali: In file: /home/jbmaster/work/01.LPD_OpenCL_RFS/01.arm_work_3_0_31/SEC_All_EVT0_TX013-BU-00001-r2p0-00rel0/TX013-BU-00001-r2p0-00rel0/driver/product/kernel/drivers/gpu/arm/t6xx/kbase/src/common/mali_kbase_mmu.c line: 1240 function:kbase_mmu_report_fault_and_kill [14100.709724] Unhandled Page fault in AS0 at VA 0x00000002000EC1A0 [14100.709728] raw fault status 0x500003C3 [14100.709730] decoded fault status: SLAVE FA [14100.709738] source id 0x5000 – Luniam Apr 22 '14 at 23:56
  • @Luniam: That sounds like a ready bug report to your OpenCL vendor. I've removed the CUDA tag from the question, it isn't actually related to CUDA programming at all. – talonmies Apr 25 '14 at 05:47

0 Answers0