pyCuda, issues sending multiple single variable arguments

Question

I have a pycuda program here that reads in an image from the command line and saves a version back with the colors inverted:

import pycuda.autoinit
import pycuda.driver as device
from pycuda.compiler import SourceModule as cpp

import numpy as np
import sys
import cv2

modify_image = cpp("""
__global__ void modify_image(int pixelcount, unsigned char* inputimage, unsigned char* outputimage)
{
  int id = threadIdx.x + blockIdx.x * blockDim.x;
  if (id >= pixelcount)
    return;

  outputimage[id] = 255 - inputimage[id];
}
""").get_function("modify_image")

print("Loading image")

image = cv2.imread(sys.argv[1], cv2.IMREAD_UNCHANGED).astype(np.uint8)

print("Processing image")

pixels = image.shape[0] * image.shape[1]
newchannels = []
for channel in cv2.split(image):
  output = np.zeros_like(channel)
  modify_image(
    device.In(np.int32(pixels)),
    device.In(channel),
    device.Out(output),
    block=(1024,1,1), grid=(pixels // 1024 + 1, 1))
  newchannels.append(output)
finalimage = cv2.merge(newchannels)

print("Saving image")

cv2.imwrite("processed.png", finalimage)

print("Done")

It works perfectly fine, even on larger images. However, in trying to expand the functionality of the program, I came across a really strange issue wherein adding a second variable argument to the kernel causes the program to completely fail, simply saving a completely black image. The following code does not work;

import pycuda.autoinit
import pycuda.driver as device
from pycuda.compiler import SourceModule as cpp

import numpy as np
import sys
import cv2

modify_image = cpp("""
__global__ void modify_image(int pixelcount, int width, unsigned char* inputimage, unsigned char* outputimage)
{
  int id = threadIdx.x + blockIdx.x * blockDim.x;
  if (id >= pixelcount)
    return;

  outputimage[id] = 255 - inputimage[id];
}
""").get_function("modify_image")

print("Loading image")

image = cv2.imread(sys.argv[1], cv2.IMREAD_UNCHANGED).astype(np.uint8)

print("Processing image")

pixels = image.shape[0] * image.shape[1]
newchannels = []
for channel in cv2.split(image):
  output = np.zeros_like(channel)
  modify_image(
    device.In(np.int32(pixels)),
    device.In(np.int32(image.shape[0])),
    device.In(channel),
    device.Out(output),
    block=(1024,1,1), grid=(pixels // 1024 + 1, 1))
  newchannels.append(output)
finalimage = cv2.merge(newchannels)

print("Saving image")

cv2.imwrite("processed.png", finalimage)

print("Done")

where the only difference is on two lines, the kernel header and it's call. The actual code of the kernel itself is unchanged, and yet this small addition completely breaks the program. Neither the compiler nor interpreter throw any errors. I have no idea how to begin to debug it, and am thoroughly confused.

talonmies · Accepted Answer · 2017-05-23T06:29:18.297

2

The device.In and relatives are designed for use with objects which support the Python buffer protocols (like numpy arrays). The source of your problem is using them to transfer non-buffer objects.

Just pass your scalars with the correct numpy dtype directly to your kernel call. Don't use device.In. The fact this worked in the original case was a complete accident

edited May 23 '17 at 06:29

answered May 23 '17 at 05:10

talonmies

70,661
34
192
269

Ah, okay thanks. That makes a lot of sense. I thought the In and Out calls were necessary in all cases, but I guess not – Maurdekye May 25 '17 at 18:51

score 0 · Answer 2 · edited May 23 '17 at 12:15

Okay, so by changing the variable arguments to pointers in the kernel it fixed the code, i'm not sure how or why. Here is the modified version of the kernel;

__global__ void modify_image(int* pixelcount, int* width, unsigned char* inputimage, unsigned char* outputimage)
{
  int id = threadIdx.x + blockIdx.x * blockDim.x;
  if (id >= *pixelcount)
    return;

  outputimage[id] = 255 - inputimage[id];
}

The remainder of the code is unchanged. If anybody wants to explain why this is a successful fix, I would greatly appreciate it.

pyCuda, issues sending multiple single variable arguments

2 Answers2

Linked

Related