You have 2 allocations that must be done, and you are only performing one of them.
You are allocating some storage for the cpu_data
pointer, but you have not allocated any storage for the Points
pointer. Therefore when you dereference Points:
cpu_data->Points[0].x = 0;
^ ^
| this dereferences the Points pointer (NOT allocated!)
|
this dereferences the cpu_data pointer (allocated)
you are dereferencing a pointer that you have not allocated, so it is invalid. Attempting to access something that way will generate an invalid access.
You have (at least) two options to fix it:
- after you have allocated space for
cpu_points
, you can perform another cudaMallocHost
allocation on cpu_points->Points
If you know the size of the Points
array (it seems like you do - NUM_POINTS
) then you could just statically allocate for it:
typedef struct {
doubleXYZW cen_sum; //struct with 4 doubles
double STS[6];
XYZW Points[NUM_POINTS];// //struct with 4 floats
}BUNDLE;
Note that your bundle_size
calculation is crafted in such a way that the 2nd method is suggested. If you go with the first method, your bundle_size
calculation is incorrect. In any event, with either method, it's easier just to compute bundle_size
as sizeof(BUNDLE)
.
To be clear, there is nothing CUDA-specific here (the error would be present e.g. if you used malloc
instead of cudaMallocHost
). The problem is rooted in basic C understanding, not CUDA.