0

I am working with jetson TX2. I capture images from camera, as unsigned char *image.

Then, I need to do some image processing. For that, I use the GPU. With the jetson TX2, we can avoid the transfer of data host/device and device/host because the RAM is shared between the GPU and the CPU. For that, I use :

int height = 6004 ;
int width = 7920 ;
int NumElement = height*width ;
unsigned char *img1 ;
cudaMallocManaged(&img1, NumElement*sizeof(unsigned char));

Using that method, there is no limitation with the PCI. My problem is how assign the image from the buffer, to img1. This method works, but it is too long :

for(int i =0 ; i<NumElement ; i++)
    img[i] = buffer[i] ;

I loose the advantage of the GPU using naive for loop ... And I if just use that method :

img = buffer

Like that, I have a problem when I enter in the kernel .

talonmies
  • 70,661
  • 34
  • 192
  • 269

1 Answers1

1

Use cudaMemcpy with cudaMemcpyDefault, something like

cudaMemcpy(&buffer[0], &img[0], NumElement * sizeof(char), cudaMemcpyDefault);

You could also potentially use memcpy

talonmies
  • 70,661
  • 34
  • 192
  • 269