I'm getting non deterministic results for some matrix computations with dask. I narrowed it down to this simple example:
import numpy as np
import dask.array as da
seed = 1234
np.random.seed(seed)
N = 1000
p = 10
X = np.random.random((N, p + 1))
X = da.from_array(X, chunks=(N / 4, p + 1))
beta = np.random.random(p+1)
y = X.dot(beta)
test = X.T.dot(y)
for i in range(5):
print(test.compute()[0])
For N=1000, this is what I get:
1468.52247693
1468.52247693
1468.52247693
1468.52247693
1468.52247693
but if I crank up N, with N=100000 for instance, the values are not the same across runs!
132623.076746
107791.947661
108065.532822
108228.788587
108065.532822
Any idea what's going on?