The PyCUDA documentation is a bit light on examples for those of us in the 'Non-Guru' class, but I'm wondering about the operations available for array operations on gpuarrays, ie. if I wanted to gpuarray this loop;
m=np.random.random((K,N,N))
a=np.zeros_like(m)
b=np.random.random(N) #example
for k in range(K):
for x in range(N):
for y in range(N):
a[k,x,y]=m[k,x,y]*b[y]
The regular first-stop python reduction for this would be something like
for k in range(K):
for x in range(N):
a[k,x,:]=m[k,x,:]*b
But I can't see any simple way to do this with GPUArray, other than writing a custom elementwise kernel, and even then with this problem there would have to be looping constructs in the kernel and at that point of complexity I'm probably better off just writing my own full blown SourceModule kernel.
Can anyone clue me in ?