0

Recently I found a unexpected bug when using 'new' operator in parallel situation with CUDA.

I want to initialize a list of object using 'new' operator. This means that I need to use a pointer of pointer to handle the list of pointers of the objects.

To optimize the process of initialization, I use a global function to 'new' the objects parallelly.

But I found the operation of "new Classname(...)" returns a nullptr to my variable.

__global__ void cuda_face_to_triangle(
        CUDA_Face* high_acc_face_address,
        CUDA_Triangle** triangle_list_address,
        size_t face_num
)
{
    const unsigned int index = blockIdx.x * blockDim.x + threadIdx.x;
    if (index >= face_num) return;

    CUDA_Face temp = high_acc_face_address[index];
    auto* temp_t = new CUDA_Triangle(
            temp.v1, temp.v2, temp.v3,
            temp.n1, temp.n2, temp.n3,
            temp.t1, temp.t2, temp.t3);
    triangle_list_address[index] = temp_t;
}

In this code, I found the 'temp_t' will be null after executing '*new CUDA_Triangle(...)' in about half of the total threads.

I used 'printf' function to detect whether the code has been executed. And I found that the constructor has not been executed in tempt-null threads, which may be the reason that the temp_t is null. BUT why does it happen?

I confirmed that variable 'temp' is completely correct. And the pointer of pointer has been initialize before. The blocks and threads I used is shown below:

__host__ void init_cuda_environment(
        CUDA_BVH_node** dst_node_address_pointer,
        CUDA_Face* dst_face_list_address,
        size_t face_num
        )
{
    // declare the pointer to pointer
    CUDA_Triangle** cudaTriangles;
    cudaMalloc((void**)&cudaTriangles, face_num * sizeof(char*));

    // face_num is 94445
    const int threads = 512;
    const dim3 blocks_1((face_num - 1) / threads + 1);

    cuda_face_to_triangle<<<blocks_1, threads>>>(
            dst_face_list_address,
            cudaTriangles,
            face_num);
    cudaDeviceSynchronize();
    printf("CUDA triangle init done\n");
}

My GPU is RTX3090, and the OS is Ubuntu 22.04. CUDA version is 11.8. CMAKE version is 3.23.2, C compiler is gcc-11, C++ compiler is g++-11.

This bug has confused me for many days. It will be greatly appreciated if someone can figure out my mistakes.

Icewired
  • 3
  • 2

0 Answers0