Reproducible issues with Python statsmodel and Docker

Question

I'm running some Python statsmodel code in a Docker container. When I run this code on two different computers (using the same Docker container pulled from DockerHub, not built locally 2x), I am getting different results. The differences are tiny - the 10th or 15th digit changes. But it is breaking our reproducible builds. Is this a Python statsmodel issue? A Docker issue?

I think this is Python, because 1000s of other lines are running in containers generated from these Docker images, and they are bit-reproducible.

Here is an MWE, and an example of the differences:

import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
np.random.seed(42)

df = pd.DataFrame(columns=['foo', 'bar'], data=np.random.random((1000, 2)))

y = (df['bar'])
X = np.log10(df['foo'])
X = sm.add_constant(X)
model = sm.OLS(y, X)
fits = model.fit()
predictions = fits.predict(X)

XX = np.linspace(X['foo'].min(), X['foo'].max(), 50)
XX = sm.add_constant(XX)
yy = fits.predict(XX)
sdev, lower, upper = wls_prediction_std(fits, exog=XX, alpha=0.05)

bad = df.loc[df['bar'] < 50,'bar']

df.loc[df['bar'] < 50,'bar'] = fits.predict(sm.add_constant(np.log10(bad)))

fits.summary()

with open("output.txt", "w") as text_file:
    text_file.write(fits.summary().as_csv())

df.to_csv('out.csv', index=False)

And the differences in out.csv are small. For example,

$ sdiff <(cat out.csv) <(ssh remote_server cat out.csv) | tail

shows the following. Note that only the last digit has changed.

0.18610141784627732,0.5081884090422659                        | 0.18610141784627732,0.5081884090422658
0.45818688673789265,0.5082792408801786                        | 0.45818688673789265,0.5082792408801785
0.13347997241594378,0.5085994020210153                        | 0.13347997241594378,0.5085994020210152
0.7279393069737652,0.5082743139146337                         | 0.7279393069737652,0.5082743139146336
0.43685070261517955,0.5082054932289445                        | 0.43685070261517955,0.5082054932289444
0.7655128989911097,0.5084780190581778                         | 0.7655128989911097,0.5084780190581777
0.6102251494776413,0.5085067071667805                         | 0.6102251494776413,0.5085067071667804
0.7513750860290457,0.5082242252400639                           0.7513750860290457,0.5082242252400639
0.956614621083458,0.5086273010565618                            0.956614621083458,0.5086273010565618
0.05705472115125432,0.5083753342014574                        | 0.05705472115125432,0.5083753342014573

Exact floating point reproducibility depends on the numerical operations used. In this case the most likely cause are the linear algebra libraries, BLAS, LAPACK, specifically OLS fit uses numpy.linalg.pinv which uses a SVD underneath. One source is that parallel computation, e.g. in MKL linalg libraries, might add some non-deterministic floating point noise. The unit tests in statsmodels only check for rtol around 1e-13 across machines and linalg libraries. Here reproducibility is better because of the identical Docker container. — Josef, Jan 30 '22 at 04:00

Reproducible issues with Python statsmodel and Docker

0 Answers0