I am looking for ways to efficiently utilise my GPU cluster to calculate percentile in randomly generated arrays for my monte-carlo simulation. I would assume that GPU would be faster than similar calculations on CPU. When I compare CuPy and Numpy on single threaded process obviouslly there is a significant performance improvement:
import cupy as cp
sample_size = 9000000
cp_res = cp.random(10,0.1,size=(400*sample_size),dtype=cp.float32)
print(cp.percentile(cp_res, 0.05))
This runs in 226ms
What would be the most efficient way to run percentile on 4400sample_size random across 2 servers with 2 GPUs each?
I am running:
client = Client(cluser_ip_address)
rs = da.random.RandomState(RandomState=cp.random.RandomState)
x = rs.normal(10, 0.1, size=(4*400*sample_size), chunks='auto')
print(dask.array.percentile(x, 0.05).compute())
I am getting this error: TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
When I swap cupy with numpy the code works fine. Am I using it wrongly? Is there an alternative way to use GPU to generate a large array on normally distributed numbers and calculate its percentile?