I'm trying to parallelize the following operation with cupy: I have an array. For each column of that array, I'm generating 2 random vectors. I take that array column, add one of the vectors, subtract the other, and make that new vector the next column of the array. I continue on until I finish with the array.
I already asked the following question - Cupy slower than numpy when iterating through array. But this is different, in that I believe I followed the advice of parallelizing the operation and having one "for loop" instead of two, and iterating only through the array columns instead of both rows and columns.
import cupy as cp
import time
#import numpy as cp
def row_size(array):
return(array.shape[1])
def number_of_rows(array):
return(array.shape[0])
x = (cp.zeros((200,200), 'f'))
#x = cp.zeros((200,200))
x[:,1] = 500000
vector_one = x * 0
vector_two = x * 0
start = time.time()
for i in range(number_of_rows(x) - 1):
if sum(x[ :, i])!=0:
vector_one[ :, i + 1], vector_two[ :, i+ 1] = cp.random.poisson(.01*x[:,i],len(x[:,i])), cp.random.poisson(.01 * x[:,i],len(x[:,i]))
x[ :, i+ 1] = x[ :, i] + vector_one[ :, i+ 1] - vector_two[ :, i+ 1]
time = time.time() - start
print(x)
print(time)
When I run this in cupy, the time comes out to about .62 seconds.
When I switch to numpy, so I 1) uncomment #import numpy as cp and #x = cp.zeros((200,200)) and 2) instead comment import cupy as cp and x = (cp.zeros((200,200), 'f')):
The time comes out to about .11 seconds.
I thought maybe if I increase the array size, for example from (200,200) to (2000,2000), then I'd see a difference in cupy being faster, but it's still slower.
I know this is working properly, in a sense, because if I change the coefficient in cp.random.poisson from .01 to .5, I can only do that in cupy because that lambda is too large for numpy.
But still, how do I make it actually faster with cupy?