I have a point cloud in device memory in dPointsWS with a memory layout where first all x-, then all y- and lastly all z-coordinates are stored. I use thrust to compute a tight axis aligned bounding box (AABB) of this point cloud. Here is my code:
// use CUDA thrust library for AABB computation
thrust::pair<thrust::device_ptr<Real>, thrust::device_ptr<Real>> thrustAABB[3];
// do parrallel min_max reduction on GPU for each coordinate axis
thrust::device_ptr<Real> dPointsWS(mDPointsWS);
for (uint32 i = 0, offset = 0; i < 3; ++i, offset += mPointCount)
thrustAABB[i] = thrust::minmax_element(dPointsWS + offset,
dPointsWS + offset + mPointCount);
cudaDeviceSynchronize();
// get results from the GPU
for (uint32 i = 0; i < 3; ++i)
{
mAABBWS[2 * i + 0] = *thrustAABB[i].first;
mAABBWS[2 * i + 1] = *thrustAABB[i].second;
}
What I am wondering about is where the result of thrust::minmax_element
is stored before the last code block. I have clearly downloaded the results to host memory at the end, but I would like to avoid this.
I've found the following article:
thrust reduction result on device memory.
However, my case is different since I use the return type thrust::pair<thrust::device_ptr<Real>, thrust::device_ptr<Real>>
.
As the reduction function returns a pair of device_ptr
objects, the minimum and maximum results should be stored on the GPU or do I misunderstand this? But if the results are stored on the GPU, how can I control their lifetime. For example, I would like to directly use the results for AABB drawing with OpenGL without downloading them to host memory.