how to improve OpenCL kernel reading __global char* data efficiently?

Asked Nov 26 '15 at 07:41

Active Nov 26 '15 at 09:51

Viewed 173 times

motion compensation between two images(3840*2160), block size 16
kernel divide 3840 * 135(135=2160/16), group size 64*1 or 128*1 (basically no difference)

Now my kernel do access global char data, but imagepos = src + mv.xy is not aligned, so must read char one by one. I think there is a latency here, CodeXL also show there is no limited by GPRs. So i need find a method to speed up data read. Also i want to know how to use local memory but data just need once. Any suggestion will be appreciated.

edited Nov 26 '15 at 09:45

asked Nov 26 '15 at 07:41

XinLiang

1

If I understood correctly, Motion Compensation is block matching, so, multiple reads. Local memory can speed a lot your processing. But without code is difficult to give an answer... – DarkZeros Nov 26 '15 at 08:41
Hi ZarkZeros, i want to use local memory, but in fact i only use it once, so from global to local,then local to private, is not efficient. below is my opencl main code. thanks! – XinLiang Nov 26 '15 at 09:27
In fact, my most important problem is to speed up global char data read by aligned reading. – XinLiang Nov 26 '15 at 09:35
For optimum read speed of __global char * data, 4 or 8 concurrent work items should read adjacent and aligned data together. Study coalesced reads. – Dithermaster Nov 28 '15 at 14:42

how to improve OpenCL kernel reading __global char* data efficiently?

0 Answers0