I was trying myself in CUDA and after lots of debugging I finally noticed that there is/are differences in basic operators between host code and CUDA.
Parsing a negative floating Value to an unsigned char results in the char being zero. That is not what happens when writing the same code on the host. I wasted hours trying to debug why my CUDA code returned something different than the same code written on the host.(I do not know how to efficiently debug CUDA apart from cuda-memcheck and printf)
Are there other things or conventions that are also easy to break and hard to find without knowing what you're looking for and what's the reason for the above differences?
Here is my code I used to test the above behavior:
Makefile:
VCC = nvcc
.PHONY: all clean
all: cudaTest
clean:
rm -f *o
cudaTest: cudaTest.o
$(VCC) -o $@ $^
cudaTest.o: cudaTest.cu
$(VCC) -c $^ `
cudaTest.cu
#include <stdlib.h>
#include <stdio.h>
__global__
void cTests(){
double d = -2;
float f = -2;
int i = -2;
char c = -2;
printf("%u, %u\n",(unsigned char)d, (unsigned char)(char)d);
printf("%u, %u\n",(unsigned char)f, (unsigned char)(char)f);
printf("%u\n",(unsigned char)i);
printf("%u\n”",(unsigned char)c);
}
int main(int argc, char* argv[]){
double d = -2;
float f = -2;
int i = -2;
char c = -2;
printf("CPU:\n");
printf("%u \n",(unsigned char)d);
printf("%u \n",(unsigned char)f);
printf("%u \n",(unsigned char)i);
printf("%u \n",(unsigned char)c);
printf("GPU:\n");
cTests<<<1,1>>>();
cudaDeviceSynchronize();
}
Result: Command: cuda-memcheck ./cudaTest >output.txt
CPU:
254
254
254
254
GPU:
0, 254
0, 254
254
254
”========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
Also for some reason the ========= CUDA-MEMCHECK line is first in the terminal but at the end in the output.txt file.