cuda code produces incorrect result in release mode

Question

my CUDA code produces correct result in Debug mode. However, in the release mode, the same code produces garbage results. Could the synchronization between threads behave differently between debug and release mode?

score 2 · Answer 1 · answered May 25 '12 at 01:37

Code generated with -O0 results in less optimal code and significantly more global and local memory accesses which may be hide a race condition. If you think you may have a race condition in shared memory you can try to the new CUDA 5.0 preview memory checker which supports some forms of race condition detection. Your best bet is to look for any location where you shared memory between two threads and determine if you are missing a thread fence of sync threads.

geek · Answer 2 · 2012-05-26T17:07:51.763

1

I think, you got the race condition problem. You can reorganize you code and add synchronization where it's needed. In debug mode your threads are usually executed in order and you can't get this problem.

edited May 26 '12 at 17:07

answered May 23 '12 at 19:59

geek

1,809
1
12
12

3

"In debug mode your threads usually executed in some order and you can't get this problem". Can you point to where that is documented please? – talonmies May 23 '12 at 20:13
@talonmies CUDA debugging is not documented well. This is our teem guess. I think it happened because of many additional debug code generated. Have you any opposite information? – geek May 24 '12 at 12:21
I do, and I believe the asserting that the hardware execution or scheduling model somehow changes depending on compiler agruments is patent nonsense. What debugging builds do is eliminate some optimisations and spill shared memory and registers to local memory so that host can inspect their state during execution. Depending on architecture, this can considerably change behaviour, both by removing certain hardware memory protection, and by using different instructions (and JIT optimizations) to operate on block local memory. – talonmies May 24 '12 at 12:41
@talonmies: I got excited with the findings. I debug my code and made this guess. I think, another possible reason of hiding race conditions is that in debug code every access to global memory is double checked to rely bound conditions and so on. Actually this lead to every variable became volatile and as you said shared variables can be actually located in global space. This helps to hide race conditions. – geek May 24 '12 at 12:55

cuda code produces incorrect result in release mode

2 Answers2