0

I can't figure out what is causing the issue. I get "access violation writing location" error in the last line. Am I not correctly allocating the memory?

    typedef struct {
    doubleXYZW cen_sum; //struct with 4 doubles
    double STS[6];
    XYZW *Points;// //struct with 4 floats
}BUNDLE;

BUNDLE *cpu_data = NULL;
size_t bundle_size = NUM_POINTS * sizeof(XYZW) + sizeof(doubleXYZW) + 6*sizeof(double);
HANDLE_ERROR(cudaMallocHost((BUNDLE**)&cpu_data, bundle_size));
//error in the next line
cpu_data->Points[0].x = 0; //x is the first element in the XYZW struct
Nenu
  • 59
  • 5

1 Answers1

2

You have 2 allocations that must be done, and you are only performing one of them.

You are allocating some storage for the cpu_data pointer, but you have not allocated any storage for the Points pointer. Therefore when you dereference Points:

cpu_data->Points[0].x = 0;
         ^      ^
         |      this dereferences the Points pointer (NOT allocated!)
         |
        this dereferences the cpu_data pointer (allocated)

you are dereferencing a pointer that you have not allocated, so it is invalid. Attempting to access something that way will generate an invalid access.

You have (at least) two options to fix it:

  1. after you have allocated space for cpu_points, you can perform another cudaMallocHost allocation on cpu_points->Points
  2. If you know the size of the Points array (it seems like you do - NUM_POINTS) then you could just statically allocate for it:

    typedef struct {
    doubleXYZW cen_sum; //struct with 4 doubles
    double STS[6];
    XYZW Points[NUM_POINTS];// //struct with 4 floats
    }BUNDLE;
    

Note that your bundle_size calculation is crafted in such a way that the 2nd method is suggested. If you go with the first method, your bundle_size calculation is incorrect. In any event, with either method, it's easier just to compute bundle_size as sizeof(BUNDLE).

To be clear, there is nothing CUDA-specific here (the error would be present e.g. if you used malloc instead of cudaMallocHost). The problem is rooted in basic C understanding, not CUDA.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thank you for the detailed lesson. It makes sense. I, obviously, have not implemented this level of complexity in C so encountered it with CUDA. As you suggested, second option suits well, probably saves some allocation time too. – Nenu Sep 20 '16 at 21:04