7

I'm getting non deterministic results for some matrix computations with dask. I narrowed it down to this simple example:

import numpy as np
import dask.array as da
seed = 1234
np.random.seed(seed)

N = 1000
p = 10
X = np.random.random((N, p + 1))
X = da.from_array(X, chunks=(N / 4, p + 1))
beta = np.random.random(p+1)
y = X.dot(beta)
test = X.T.dot(y)
for i in range(5):
    print(test.compute()[0])

For N=1000, this is what I get:

1468.52247693
1468.52247693
1468.52247693
1468.52247693
1468.52247693

but if I crank up N, with N=100000 for instance, the values are not the same across runs!

132623.076746
107791.947661
108065.532822
108228.788587
108065.532822

Any idea what's going on?

Thrasibule
  • 319
  • 2
  • 6
  • 1
    I suspect there is some bad interaction going on with Openblas threads. In particular, if I run it with USE_OMP_THREADS=1, then it works fine. – Thrasibule Mar 03 '17 at 01:12

1 Answers1

0

As of dask version 2021.11.2, this error is no longer reproduced. Running different values of N gives the same result.

# N = 1000
1468.522476925747
1468.522476925747
1468.522476925747
1468.522476925747
1468.522476925747

# N = 100000
108065.53282187709
108065.53282187709
108065.53282187709
108065.53282187709
108065.53282187709

# N = 10000000
15246946.969951011
15246946.969951011
15246946.969951011
15246946.969951011
15246946.969951011
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46