0
  1. motion compensation between two images(3840*2160), block size 16

  2. kernel divide 3840 * 135(135=2160/16), group size 64*1 or 128*1 (basically no difference)

Now my kernel do access global char data, but imagepos = src + mv.xy is not aligned, so must read char one by one. I think there is a latency here, CodeXL also show there is no limited by GPRs. So i need find a method to speed up data read. Also i want to know how to use local memory but data just need once. Any suggestion will be appreciated.

XinLiang
  • 1
  • 2
  • 1
    If I understood correctly, Motion Compensation is block matching, so, multiple reads. Local memory can speed a lot your processing. But without code is difficult to give an answer... – DarkZeros Nov 26 '15 at 08:41
  • Hi ZarkZeros, i want to use local memory, but in fact i only use it once, so from global to local,then local to private, is not efficient. below is my opencl main code. thanks! – XinLiang Nov 26 '15 at 09:27
  • In fact, my most important problem is to speed up global char data read by aligned reading. – XinLiang Nov 26 '15 at 09:35
  • For optimum read speed of __global char * data, 4 or 8 concurrent work items should read adjacent and aligned data together. Study coalesced reads. – Dithermaster Nov 28 '15 at 14:42

0 Answers0