CUDA operations that work differently than on CPU

Question

I was trying myself in CUDA and after lots of debugging I finally noticed that there is/are differences in basic operators between host code and CUDA.

Parsing a negative floating Value to an unsigned char results in the char being zero. That is not what happens when writing the same code on the host. I wasted hours trying to debug why my CUDA code returned something different than the same code written on the host.(I do not know how to efficiently debug CUDA apart from cuda-memcheck and printf)

Are there other things or conventions that are also easy to break and hard to find without knowing what you're looking for and what's the reason for the above differences?

Here is my code I used to test the above behavior:

Makefile:

    VCC = nvcc

    .PHONY: all clean

    all: cudaTest

    clean: 
        rm -f *o

    cudaTest: cudaTest.o
        $(VCC) -o $@ $^
    cudaTest.o: cudaTest.cu
        $(VCC) -c $^ `

cudaTest.cu

#include <stdlib.h>
#include <stdio.h>
__global__
void cTests(){

    double d = -2;
    float f = -2;
    int i = -2;
    char c = -2;
    printf("%u, %u\n",(unsigned char)d, (unsigned char)(char)d);
    printf("%u, %u\n",(unsigned char)f, (unsigned char)(char)f);
    printf("%u\n",(unsigned char)i);
    printf("%u\n”",(unsigned char)c);
}

int main(int argc, char* argv[]){
    double d = -2;
    float f = -2;
    int i = -2;
    char c = -2;
    printf("CPU:\n");
    printf("%u \n",(unsigned char)d);
    printf("%u \n",(unsigned char)f);
    printf("%u \n",(unsigned char)i);
    printf("%u \n",(unsigned char)c);
    printf("GPU:\n");
    cTests<<<1,1>>>();
    cudaDeviceSynchronize();
}

Result: Command: cuda-memcheck ./cudaTest >output.txt

CPU:
254 
254 
254 
254 
GPU:
0, 254
0, 254
254
254
”========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors

Also for some reason the ========= CUDA-MEMCHECK line is first in the terminal but at the end in the output.txt file.

@P__J__ : Can you please clarify your answer? The code compiles without warnings or errors. And if my way of doing it is wrong how am I supposed to call the CUDA function then? — The Kraken, Dec 22 '19 at 15:41
You may wish to read [this](https://stackoverflow.com/questions/50134795/cast-float-to-unsigned-int-in-c-with-gcc). I'm reasonably confident you are exploring undefined behavior. — Robert Crovella, Dec 22 '19 at 15:43
@P__J__ The syntax `cTests<<<1,1>>>();` is not syntax for C source code. However this source code file is intended to be compiled with a special compiler for CUDA programming which has special extensions needed for CUDA programming. See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html for an intro. I think that this demonstrates that CUDA is not C or C++ and the underlying memory model is not the same for the GPU code generated as for the CPU code. Which makes sense as a GPU is an entirely different beast than a CPU. — Richard Chambers, Dec 22 '19 at 15:53
@RobertCrovella So the problem is that it was undefined behavior all along and I had luck that it at least worked in C. Is there a save way of doing this operation? Like casting to an signed char first and then to an unsigned char, or is this undefined also? Or maybe getting the absolut value, or setting the first bit to 0? — The Kraken, Dec 22 '19 at 15:54
I think you need to ask a new question that describes what it is that you are actually doing, not a simple test program such as this. Once people see what you are trying to do, the source code along with a description of expected behavior and the end goals of the source, people will be able to provide a way for you to achieve what you want to achieve. — Richard Chambers, Dec 22 '19 at 15:56

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

As indicated here, the process of converting a floating-point number to an integer type goes through several steps:

This is detailed in section 6.3.1.4 of the C standard which dictates conversion from floating point types to integer types:

1 When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.61)

In your particular case, you have negative floating point value (e.g. -2.0) and you are attempting to directly cast it to an unsigned type. The first step is to drop the fraction part, which leaves us with -2, and then:

If the value of the integral part cannot be represented by the integer type, the behavior is undefined.61)

-2 cannot be represented by an unsigned integer type, therefore the behavior is undefined. I don't wish to try to provide a recital of undefined behavior (UB) as it is covered in many places elsewhere. But once you have a case of undefined behavior, the implementation (i.e. the compiler) is free to do (approximately) anything, and still be standard-compliant.

Therefore suggesting that CUDA is somehow not standard compliant because it converts -2.0 to any particular value upon direct cast to an unsigned type, is not sensible.

It's quite possible that in such a case, the behavior of one implementation (the host compiler) may not match the behavior of another implementation (the device compiler). Such mismatch in the case of UB does not tell you anything about standards compliance.

(FWIW, CUDA does not claim compliance to any C standard, currently. It does claim compliance to a particular C++ standard subject to various limitations, but for the purpose of this question, I think that distinction is not material.)

With respect to your question: "Is there a save[sic] way of doing this operation?", it's difficult to answer, because I don't find a clear definition of what "this operation" is, in your question. Presumably you want a signed to unsigned integer wrapping behavior. As far as I know, such a conversion is "safe", cannot "overflow", and always has a predictable outcome for any input, independent of implementation. It should never lead to UB.

So I suspect that what you want to do is possible by first casting to a signed integer, then to unsigned. For the final step of that process (conversion from signed to unsigned), the behavior should be quite predictable. I'm not sure the conversion from any arbitrary floating point number to a signed integer will work according to your definition of "this operation" (please re-read the first quotation in this answer), but you haven't given a clear definition of "this operation".

CUDA operations that work differently than on CPU

1 Answers1