-2

I am moving my first steps into PyCuda to perform some parallel computation and I came across a behavior I do not understand. I started from the very basic tutorial that can be found on PyCuda official website (a simple script to double all elements of an array https://documen.tician.de/pycuda/tutorial.html). The code is the following:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
a = numpy.random.randn(4,4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(float *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

Is quite clear and it works. An example result is

[[-1.9951048  -1.7537887  -1.3228793  -1.1585734 ]
 [-0.96863186 -1.7235669  -0.3331826  -1.1527038 ]
 [ 2.4142797  -0.35531005  1.8844942   3.996446  ]
 [ 1.400629   -2.7957075  -0.78042877  0.13829945]]
[[-0.9975524  -0.87689435 -0.66143966 -0.5792867 ]
 [-0.48431593 -0.86178344 -0.1665913  -0.5763519 ]
 [ 1.2071398  -0.17765503  0.9422471   1.998223  ]
 [ 0.7003145  -1.3978537  -0.39021438  0.06914973]]

But then I tried to modify slightly the code to deal with integer numbers:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy

a = numpy.array([[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]])

a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
  __global__ void doublify(int *a)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    a[idx] *= 2;
  }
  """)
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

... and this does not work. Only a part of the 2d array is multiplied by 2, the rest is unchanged. Example result:

[[2 4 6 8]
 [2 4 6 8]
 [1 2 3 4]
 [1 2 3 4]]
[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]

Why is this happening? What is the difference between the tutorial and the modified code?

Thanks to all!

Dam
  • 1
  • 2
  • 2
    why did you remove this line: `a = a.astype(numpy.float32)` ? It's pretty important. Types matter when you are working with pycuda. Note that your kernel code is still expecting `float` data: `__global__ void doublify(float *a)` – Robert Crovella Jul 05 '22 at 14:36
  • @RobertCrovella actually the "float" declaration is a typo. In my code is correctly set to "int", as the data I am dealing with. Actually I have to tackle a problem with int numbers (parallelize lots of operations on arrays containing ints), is it prohibited to use them with GPUs? – Dam Jul 06 '22 at 14:07
  • This may be a side-step of your problem rather than an answer, but I've always found [`numba.cuda`](http://numba.pydata.org/numba-doc/latest/cuda/) and [`cupy`](https://cupy.dev/) easier to work with than PyCUDA. – Dominik Stańczak Jul 06 '22 at 14:44
  • 1
    so you should use `a = a.astype(numpy.int32)`. I'm only able to work with what I see here. – Robert Crovella Jul 06 '22 at 15:01

1 Answers1

0

OK so I kinda solved staying with float type, even though I need to work with integers. Apparently there are some behind-the-scene mechanism when allocating memory for integers and this does not fit with PyCuda.

Dam
  • 1
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 02 '22 at 21:59