0

I have a Mali GPU which does not support local memory at all. Everytime I run code consisting of local memory it gives me some errors from the device. So, I want to transfer my codes to a version that only uses global memory. I was thinking if it is possible to run a prefix sum/parallel reduction algorithm using global memory only on GPU.

EDITED : I was debugging the error and found a strange thing that one particular line is giving the erorr. I have e line like this:

`#define LOG_LSIZE 8`
`#define LSIZE_SHIFT_VALUE 4`
`#define LOG_NUM_BANKS 2`
`#define GET_CONFLICT_OFFSET(lid) ((lid) >> LOG_NUM_BANKS)`
`#define LSIZE 32`
`__local int lm_sum[2][LSIZE + LOG_LSIZE]`
`**lm_sum[lid >> LSIZE_SHIFT_VALUE][bi]  +=  lm_sum[lid >>  LSIZE_SHIFT_VALUE][ai]**`

lid is local id and I used qork groups size 32. I found that the highlighted line is the cause of the error. I tried using fixed values and found that I cannot use lm_sum on the right side of a statement. If I do, that gives me an error. For example, this line also gives me error: int temp= lm_sum[0][0]

Any idea on what is going on?

Error:

`In initial.cpp***[14100.684249] Mali<ERROR, BASE_MMU>: In file: /home/jbmaster/work/01.LPD_OpenCL_RFS/01.arm_work_3_0_31/SEC_All_EVT0_TX013-BU-00001-r2p0-00rel0/TX013-BU-00001-r2p0-00rel0/driver/product/kernel/drivers/gpu/arm/t6xx/kbase/src/common/mali_kbase_mmu.c line: 1240 function:kbase_mmu_report_fault_and_kill 
[14100.709724] Unhandled Page fault in AS0 at VA 0x00000002000EC1A0
[14100.709728] raw fault status 0x500003C3
[14100.709730] decoded fault status: SLAVE FAULT
[14100.709733] exception type 0xC3: TRANSLATION_FAULT
[14100.709736] access type 0x3: WRITE
[14100.709738] source id 0x5000
[14100.734958] 
[14100.736432] Mali<ERROR, BASE_JD>: In file: /home/jbmaster/work/01.LPD_OpenCL_RFS/01.arm_work_3_0_31/SEC_All_EVT0_TX013-BU-00001-r2p0-00rel0/TX013-BU-00001-r2p0-00rel0/driver/product/kernel/drivers/gpu/arm/t6xx/kbase/src/common/mali_kbase_jm.c line: 899 function:kbase_job_slot_hardstop 
[14100.761458] Issueing GPU soft-reset instead of hard stopping job due to a hardware issue
[14100.769517] ` 
Luniam
  • 463
  • 7
  • 21

1 Answers1

0

Since lm_sum[0][0] doesn't work, the memory for the array is not allocated. You said your GPU doesn't support local memory. Well, you are trying to use lm_sum which is declared to be in local memory (__local int lm_sum[2][LSIZE + LOG_LSIZE]).

Kurt
  • 297
  • 2
  • 10
  • GPU does not support local memlry that is my assumption. If I can use lm_sum on the left side, I should be able to use it on the right side also. – Luniam Apr 21 '14 at 18:33
  • So you are saying that the following works: `lm_sum[0][0] = 2` – Kurt Apr 21 '14 at 19:11
  • yes! that line works perfect. It may seemed weird but that is happening. – Luniam Apr 21 '14 at 19:24
  • It doesn't seem like it should work if local memory is not supported. Also, it looks like the error is about a "Write": `[14100.709736] access type 0x3: WRITE`. Either way, why don't you just declare lm_sum to be in global instead of local memory? – Kurt Apr 21 '14 at 21:04
  • Well the problem is the access pattern and calculation is done using lid. If I change lm_sum to global then I have change all of these places one by one and I am not sure if the output would be correct. – Luniam Apr 22 '14 at 23:53