1

I am trying to sum all the pixels in an image, and get the average of all pixels using the CUDA NPP library. My image is an 8-bit unsigned char grayscale image of dimension w256 x h1024. I have tried to follow all the required rules of declaring pointers and passing the corresponding NPP-type pointers to the NPP functions.

However, I am getting an unknown error when I perform GPU error checking on my code. I tried to debug it but, I can't seem to figure out as to where I am going wrong, and I would like some help please?

I am using OpenCV in addition to this to do my processing, and hence some OpenCV code will be present.

EDIT: Code has been updated

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
    if (code != cudaSuccess) 
    {
        fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
        if (abort) getchar();
    }
}

// process image here 

// device_pointer initializations
unsigned char *device_input;
unsigned char *device_output;    

size_t d_ipimgSize = input.step * input.rows;
size_t d_opimgSize = output.step * output.rows;

gpuErrchk( cudaMalloc( (void**) &device_input, d_ipimgSize) );
gpuErrchk( cudaMalloc( (void**) &device_output, d_opimgSize) );

gpuErrchk( cudaMemcpy(device_input, input.data, d_ipimgSize, cudaMemcpyHostToDevice) );

// Median filter the input image here
// .......

// start summing all pixels 
Npp64s *partialSum = 0; 
partialSum = (Npp64s *) malloc(sizeof(Npp64s));

int bytes = input.cols*input.rows;

Npp8u *scratch = nppsMalloc_8u(bytes);

int ostep = input.step; 
NppiSize imSize; 
imSize.width = input.cols; 
imSize.height = input.rows;

// copy processed image data into a source_pointer
unsigned char *odata; 
odata = (unsigned char*) malloc( sizeof(unsigned char) * input.rows * input.cols);
memcpy(odata, output.data, sizeof(unsigned char) * input.rows * input.cols);

// compute the sum over all the pixels
nppiSum_8u64s_C1R( odata, ostep, imSize, scratch, partialSum );

// print sum 
printf( "\n Total Sum cuda %d \n",  *partialSum) ;

gpuErrchk(cudaFree(device_input));   // <--- Unknown error here
gpuErrchk(cudaFree(device_output)); 
Eagle
  • 1,187
  • 5
  • 22
  • 40
  • where are `device_input` and `device_output` variables declared and allocated? Can you show that code? – Robert Crovella Mar 20 '14 at 23:25
  • @RobertCrovella I updated the code to show the declarations and definitions of `device_input` and `device_output`. While debugging, I had tried to change the declaration of `bytes`, `ostep`, `imSize`, and `odata` to use the openCV `output` structure (as in `output.step`, `output.rows`, `output.cols`), to see if I could get rid of the errors. But, that did not seem to work either. – Eagle Mar 20 '14 at 23:48

1 Answers1

1

The partialSum argument in nppiSum_8u64s_C1R should be device allocated memory.

Further you allocate scratch buffer of the size of your image. There's a function called nppiSumGetBufferHostSize_8u64s_C1R that gives you the exact size for the scratch buffer, which might be larger than the image itself (not very likely for a simple summation, but possible).

And always check return values in NPP as for Cuda, too. nppiSum_8u64s_C1R probably won't return NPP_NO_ERROR in your case.

kunzmi
  • 1,024
  • 1
  • 6
  • 8
  • How do I check for NPP return errors? My `gpuErrchk` function does not provide a return type of `Nppstatus`. – Eagle Mar 21 '14 at 00:34
  • I tried following the method to allocate the scratch buffer memory as per this thread http://stackoverflow.com/questions/6338690/npp-library-function-argument-pdevicebuffer . But, I got an `identifier undefined` error when I used `nppsReductionGetBufferSize_8u`. – Eagle Mar 21 '14 at 00:37
  • Also, should the scratch buffer be allocated on the device? According to the NPP Primitives library, it is. https://developer.nvidia.com/sites/default/files/akamai/cuda/files/CUDADownloads/NPP_Library.pdf – Eagle Mar 21 '14 at 00:50
  • There is documentation for npp [here](http://docs.nvidia.com/cuda/pdf/NPP_Library.pdf). Most of your questions are answered there. – Robert Crovella Mar 21 '14 at 02:02
  • @kunzmi Is `nppiSumGetBufferHostSize_8u64s_C1R` a function that is present in CUDA SDK v5.5? I keep getting `identifier undefined` errors for that function because I have the v4.2 toolkit/SDK. – Eagle Mar 21 '14 at 03:07
  • If the nppiSumFoo function is available, a correspondent nppiSumGetBufferHostSize should also be there. If I recall it right, there was one GetBufferSize-function for everything, not one per type as it now. Check in NPP-description of your version what they state about that. And the buffer is allocated on device. – kunzmi Mar 21 '14 at 07:17
  • 1
    I just checked, in version 4.2 the buffer size is retrieved with nppiReductionGetBufferHostSize_8u_C1R. – kunzmi Mar 21 '14 at 07:37