4

I'm experiencing a problem with addition assignment operator in Cuda C. I'm getting the following error:

kernel.cu(5): error: expression must have integral or enum type

My code is :

import pycuda.driver as drv
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy as np

mod=SourceModule("""
__global__ void addition(float* a,float* b,float*c){
int i=threadIdx.x + blockIdx.x * blockDim.x;
c[a[i]]+=b[i];
}
""")

addition=mod.get_function("addition")
a=np.array([1,2,3,1,2,3,2,1]).astype(np.float32)
b=np.array([0.1,0.2,0.1,0.5,0.1,0.2,0.1,0.5]).astype(np.float32)
c=np.zeros_like(a)
addition(drv.Out(c),drv.In(a),drv.In(b),block=(32,1,1))
print c

My desired output is c = [0,1.1,0.4,0.3,0,0,0,0]. Can anyone suggest the solution?

1 Answers1

1

the problem is in your kernel where you index in C using A.
A is of type float.

Also notice that you are launching 32 threads but you will only index in 8 positions which means that you will index out of bounds.

The last problem you will face is that several threads try to change the same position in C due to duplicated indices in a. One way to fix it is to use AtomicAdd.

__global__ void addition(float* a,float* b,float*c, int n)
{
int i=threadIdx.x + blockIdx.x * blockDim.x;
if(i < n)
atomicAdd(&c[(int)a[i]],b[i]);
}

Launch the kernel the same way but don't forget to pass the n which is the size of the a or b.
You could also eliminate the n and change the threadblock dimension when you launch the kernel.

brano
  • 2,822
  • 19
  • 15
  • I have edited the code above. Realized that you had a another problem to deal with. – brano Nov 12 '12 at 09:45
  • When you launch the kernel change the drv.Out(c) to drv.InOut(c). This will handle the copy to and from the GPU. If you only specify it as Out it will not copy the C that was initialized to 0 and you will work with garbage data. – brano Nov 12 '12 at 09:50
  • Thank you for your suggestion. I'm getting the error as "identifier AtomicAdd is undefined". Can you please explain the way to overcome this ? – Daniel Wonglee Nov 12 '12 at 09:53
  • In order to use AtomicAdd you need to have compute capability 1.1 or higher. In order to use AtomicAdd with floating point values you need to have compute capability 2.x or higher. First check the compute capability of your card. And then you will need to tell the compiler to use that specified compute capability. – brano Nov 12 '12 at 09:56
  • Thanks. I'll try the same. By the way , compute capability of my card is 3.0 – Daniel Wonglee Nov 12 '12 at 11:11
  • Than atomicAdd on floating point should work. I spelled atomidAdd with a capital A before but i have change it now. – brano Nov 12 '12 at 12:17
  • compile with -arch=sm_30 to get the compiler to recognize that you have a CC 3.0 chip and compile for it without the error – Robert Crovella Nov 12 '12 at 14:28
  • @brano,Robert Cronella: Thanks for all your suggestions. It works. – Daniel Wonglee Nov 14 '12 at 01:42