Non deterministic results with dask

Question

I'm getting non deterministic results for some matrix computations with dask. I narrowed it down to this simple example:

import numpy as np
import dask.array as da
seed = 1234
np.random.seed(seed)

N = 1000
p = 10
X = np.random.random((N, p + 1))
X = da.from_array(X, chunks=(N / 4, p + 1))
beta = np.random.random(p+1)
y = X.dot(beta)
test = X.T.dot(y)
for i in range(5):
    print(test.compute()[0])

For N=1000, this is what I get:

1468.52247693
1468.52247693
1468.52247693
1468.52247693
1468.52247693

but if I crank up N, with N=100000 for instance, the values are not the same across runs!

132623.076746
107791.947661
108065.532822
108228.788587
108065.532822

Any idea what's going on?

I suspect there is some bad interaction going on with Openblas threads. In particular, if I run it with USE_OMP_THREADS=1, then it works fine. — Thrasibule, Mar 03 '17 at 01:12

score 0 · Answer 1 · answered Dec 04 '21 at 04:57

As of dask version 2021.11.2, this error is no longer reproduced. Running different values of N gives the same result.

# N = 1000
1468.522476925747
1468.522476925747
1468.522476925747
1468.522476925747
1468.522476925747

# N = 100000
108065.53282187709
108065.53282187709
108065.53282187709
108065.53282187709
108065.53282187709

# N = 10000000
15246946.969951011
15246946.969951011
15246946.969951011
15246946.969951011
15246946.969951011

Non deterministic results with dask

1 Answers1