0

Paralleling with dask is slower than sequential coding.

I have a nested for loops which I am trying to parallel on a local cluster but can't find the right way.

I want to parallel the inside loop.

I have 2 big numpy matrices which I am trying to iterate over and perform a mathematical calculation on a subset of the matrices. dimensions:

data_mat.shape = (38, 243863)
indicies_mat.shape (243863, 27)
idxX.shape = (19,)
idxY.shape = (19,)

seq_code:

start = datetime.datetime.now()
for i in range(num+1):
    if i == 0:
        labels = np.array(true_labels)
    else:
        labels = label_mat[i]

    idxX = list(np.where(labels == 1))
    idxY = list(np.where(labels == 2))

    ansColumn = []

    for j in range(indices.shape[0]):
        list_of_indices = [[i] for i in indices_slice]
        dataX = (data_mat[idxX, list_of_indices]).T
        dataY = (data_mat[idxY, list_of_indices]).T
        ansColumn.append(calc_func(dataX, dataY))

    if i == 0:
        ansMat = ansColumn
    else:
        ansMat = np.c_[ansMat, ansColumn]


end = datetime.datetime.now()
print(end - start)

parallel code:

start = datetime.datetime.now()
cluster = LocalCluster(n_workers=4, processes=False)
client = Client(cluster)
for i in range(num+1):
    if i == 0:
        labels = np.array(true_labels)
    else:
        labels = label_mat[i]

    idxX = list(np.where(labels == 1))
    idxY = list(np.where(labels == 2))

    [big_future] = client.scatter([data_mat], broadcast=True)
    [idx_b] = client.scatter([idxX], broadcast=True)
    [idy_b] = client.scatter([idxY], broadcast=True)


    futures = [client.submit(prep_calc_func, idx_b, idy_b, indices[j, :], big_future) for j in range(indices.shape[0])]
    ansColumn = []

    for fut in dask.distributed.client.as_completed(futures):
        ansColumn.append(fut.result())

    if i == 0:
        ansMat = ansColumn
    else:
        ansMat = np.c_[ansMat, ansColumn]


end = datetime.datetime.now()
print(end - start)

helper function:

def = prep_calc_func(idxX, idxY, subset_of_indices, data_mat):
    list_of_indices = [[i] for i in indices_slice]
    dataX = (data_mat[idxX, subset_of_indices]).T
    dataY = (data_mat[idxY, subset_of_indices]).T
    ret_val = calc_func(dataX, dataY)
    return ret_val

local machine: MacBook Pro (Retina, 13-inch, Mid 2014) Processor: 2.6 GHz Intel Core i5

hw.physicalcpu: 2 hw.logicalcpu: 4

Memory: 8 GB 1600 MHz DDR3

when I execute the seq code it takes 01:52 min to complete (less than 2 minutes)

but when I try the parallel code it takes a lot more than 15 min. (no matter which method I use: compute, result and client.submit or dask delayed)

(I prefer to use the dask distributed package because the next phase is maybe using remote clusters too.)

Any idea what am I doing wrong?

netfr
  • 1
  • 4

1 Answers1

0

There are many reasons why something can be slow. There might be a lot of communication. Your tasks might be too small (recall that Dask's overhead is around 1ms per task), or something else entirely. For more information on understanding performance in Dask I recommend the following documents:

MRocklin
  • 55,641
  • 23
  • 163
  • 235