Dask lowest than vanilla python? What is what I'm doing wrong?

Question

I'm testing dask and i can't understand how dask is slower that plain python. I was developed in jupyer two examples to get the time for each, and i think that i am doing something wrong

The first with dask: 28.5 seconds and after in plain python 140 ms

    import dask
    import dask.array as da
    %%time
    def inc(x):
        return x + 1

    def double(x):
        return x + 2

    def add(x, y):
        return x + y

    N = 100000

    data = [0 for x in range(N)]
    x = da.from_array(data, chunks=(1000))

    output = []
    for x in data:
        a = dask.delayed(inc)(x)
        b = dask.delayed(double)(x)
        c = dask.delayed(add)(a, b)
        output.append(c)

    total = dask.delayed(sum)(output)
    total.compute()

**28.8 seconds**

Now with plain python

    %%time
    def inc(x):
        return x + 1

    def double(x):
        return x + 2

    def add(x, y):
        return x + y

    N = 100000

    data = [0 for x in range(N)]

    output = []
    for x in data:
        a = inc(x)
        b = double(x)
        c = add(a, b)
        output.append(c)

    total = sum(output)

**140 milliseconds**

score 0 · Answer 1 · answered Dec 13 '19 at 15:47

Your code run on my machine: 38s. This code:

x = da.from_array(data, chunks=(1000))
%time ((x + 1) + (2*x)).compute()

runs in 24ms.

x = np.array(data)
%time ((x + 1) + (2*x))

runs in 350us.

Points:

is your data fits in memory easily (numpy or pandas), you probably don't get anything from dask, since those libraries are already fast
Dask has collection APIs like array, so use them
don't for-iterate over arrays!
if an individual function runs in a time <<1ms, dask is only adding overhead; this is certainly your case. You'll notice in the tutorial that the functions include sleep to simulate CPU work, so that you actually get some parallelism
don't call .compute() many many times, try to form what you want to do into a single call to compute, which takes an arbitrary number of arguments.

Dask lowest than vanilla python? What is what I'm doing wrong?

1 Answers1