0

I'm trying to layer dask on top of my cuda functions, but when dask returns I get a NoneType object.

from numba import cuda
import numpy as np
from dask.distributed import Client, LocalCluster


@cuda.jit()
def addingNumbersCUDA (big_array, big_array2, save_array):
    i = cuda.grid(1)
    if i < big_array.shape[0]:
        for j in range (big_array.shape[1]):
            save_array[i][j] = big_array[i][j] * big_array2[i][j]


if __name__ == "__main__":
    cluster = LocalCluster()
    client = Client(cluster)


    big_array = np.random.random_sample((100, 3000))
    big_array2  = np.random.random_sample((100, 3000))
    save_array = np.zeros(shape=(100, 3000))


    arraysize = 100
    threadsperblock = 64
    blockspergrid = (arraysize + (threadsperblock - 1))

    x = client.submit(addingNumbersCUDA[blockspergrid, threadsperblock], big_array, big_array2, save_array)


    y = client.gather(x)
    print(y)

I understand that you don't actually return in a cuda function and that the results are pushed back to the array you called in. Is this why I'm getting a noneType, or is it because I'm using dask wrong for cuda?

Bryce Booze
  • 165
  • 1
  • 11

1 Answers1

0

As pointed out in this question: How to use Dask to run python code on the GPU? by Matthew Rocklin, dask can't handle in-place operations. In order to account for this it would be better to add an additional function that handles the gpu code.

from numba import cuda
import numpy as np
from dask.distributed import Client, LocalCluster


@cuda.jit()
def addingNumbersCUDA (big_array, big_array2, save_array):
    i = cuda.grid(1)
    if i < big_array.shape[0]:
        for j in range (big_array.shape[1]):
            save_array[i][j] = big_array[i][j] * big_array2[i][j]

def toCUDA (big_array, big_array2, save_array):
    arraysize = 100
    threadsperblock = 64
    blockspergrid = (arraysize + (threadsperblock - 1))

    d_big_array = cuda.to_device(big_array)
    d_big_array2 = cuda.to_device(big_array2)
    d_save_array = cuda.to_device(save_array)

    addingNumbersCUDA[blockspergrid, threadsperblock](d_big_array, d_big_array2, d_save_array)

    save_array = d_save_array.copy_to_host()
    return save_array

if __name__ == "__main__":
    cluster = LocalCluster()
    client = Client(cluster)

    big_array = np.random.random_sample((100, 3000))
    big_array2  = np.random.random_sample((100, 3000))
    save_array = np.zeros(shape=(100, 3000))

    x = client.submit(toCUDA, big_array, big_array2, save_array)


    y = client.gather(x)
    print(y)
Bryce Booze
  • 165
  • 1
  • 11