-2

I am trying to iterate through a 2D array in PyCUDA but I end up with repeated array values. I initially throw a small random integer array and that works as expected but when I throw an image at it, I see the same values over and over again.

Here is my code

img = np.random.randint(20, size = (4,5))
print "Input array"
print img
img_size=img.shape
print img_size

#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)


mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(int *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
    printf(" %d",a[j + i*col]);
}
}
""")

col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)

enter image description here

Now when I replace the random integer array with an image converted to a numpy array I end up with this

img = cv2.imread('Chest.jpg',0)
img_size=img.shape
print img_size

#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)

mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(int *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
    printf(" %d",a[j + i*col]);
}
}
""")
#Gives you the number of columns
col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)

enter image description here

user2808264
  • 459
  • 2
  • 11
  • 24
  • @talonmies I tried with floats as well as ints but I still end up getting the same results – user2808264 May 06 '17 at 20:00
  • 1
    Post an [MCVE] then, otherwise there is no way to say what is going on – talonmies May 06 '17 at 20:05
  • @talonmies I have added a sample input that makes my question more clear and easy to understand – user2808264 May 06 '17 at 20:25
  • 1
    The code looks like it is working correctly to me. Contrary to your statement about the same value repeated again and again, I don't see that in your output, and in fact your output shows in the output matrix printout each and every element that appears in the input matrix, grouped by column. If anything I would say it is *less* clear now, since your output suggests the code is working correctly. You should provide a [mcve] for questions asking for debugging assistance. SO expectations use the word **must** [here](http://stackoverflow.com/help/on-topic). – Robert Crovella May 06 '17 at 21:03
  • @talonmies I did a poor job of explaining my issue previously by only posting partial code snippets with images that did not reflect the issue. I have added the complete code with the two different scenarios. I think it should be [MCVE](http://stackoverflow.com/help/mcve) – user2808264 May 07 '17 at 17:49
  • @RobertCrovella Could you please review the question now. – user2808264 May 07 '17 at 17:51
  • 2
    OK so now it is obvious. Please read some OpenCV [documentation](http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_core/py_basic_ops/py_basic_ops.html). My first comment was 100% correct. Your source image data isn't stored in 32 bit signed integers per pixel and your kernel is trying to read the data in the wrong format. – talonmies May 07 '17 at 18:38
  • @talonmies Yeah adding img = img1.astype(np.int32) to my code helped get rid of the strange values – user2808264 May 07 '17 at 19:49

1 Answers1

2

The problem here is that the image you are loading doesn't have pixel values stored as signed integers. This modification of your example works more as expected:

import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import numpy as np
import cv2 

import pycuda.autoinit

img = cv2.imread('Chest.jpg',0)
img_size=img.shape
print img_size
print img.dtype

#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)

mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(unsigned char *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
    int val = int(a[j + i*col]);
    printf(" %d", val);
}
}
""")
#Gives you the number of columns
col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)

When run the code emits this:

$ python image.py 
(681, 1024)
uint8
Output array  244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 245 245 245 246 246 246 246 246 246 246 246 246 246 246 244 244 244 244 244 244 244 244 245 245 245 245 245 245 245 245 244 244 245 245 245 246 246 246 

[Output clipped for brevity]

Note the dtype of the image - uint8. Your code is attempting to treat the stream of unsigned 8 bit values as integers. It should technically generate a runtime error on a full image because the kernel will read beyond the size of image as it reads 4 bytes per pixel instead of 1. However, you don't see this because you only run a single block, and your input image is presumably at least four times larger than the 32 x 32 size of the block you run.

Incidentally, PyCUDA is extremely good at managing and enforcing type safety for CUDA calls, but your code neatly defeats every mechanism by which PyCUDA could detect a type mismatch in the kernel call. PyCUDA includes an excellent GPUarray class. You should familiarise yourself with it. If you had used a GPUarray instance here, you would have gotten type mismatch runtime errors which would have alerted you to the exact source of the problem the first time you tried to run it.

talonmies
  • 70,661
  • 34
  • 192
  • 269