I'm running some Python statsmodel code in a Docker container. When I run this code on two different computers (using the same Docker container pulled from DockerHub, not built locally 2x), I am getting different results. The differences are tiny - the 10th or 15th digit changes. But it is breaking our reproducible builds. Is this a Python statsmodel issue? A Docker issue?
I think this is Python, because 1000s of other lines are running in containers generated from these Docker images, and they are bit-reproducible.
Here is an MWE, and an example of the differences:
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
np.random.seed(42)
df = pd.DataFrame(columns=['foo', 'bar'], data=np.random.random((1000, 2)))
y = (df['bar'])
X = np.log10(df['foo'])
X = sm.add_constant(X)
model = sm.OLS(y, X)
fits = model.fit()
predictions = fits.predict(X)
XX = np.linspace(X['foo'].min(), X['foo'].max(), 50)
XX = sm.add_constant(XX)
yy = fits.predict(XX)
sdev, lower, upper = wls_prediction_std(fits, exog=XX, alpha=0.05)
bad = df.loc[df['bar'] < 50,'bar']
df.loc[df['bar'] < 50,'bar'] = fits.predict(sm.add_constant(np.log10(bad)))
fits.summary()
with open("output.txt", "w") as text_file:
text_file.write(fits.summary().as_csv())
df.to_csv('out.csv', index=False)
And the differences in out.csv
are small. For example,
$ sdiff <(cat out.csv) <(ssh remote_server cat out.csv) | tail
shows the following. Note that only the last digit has changed.
0.18610141784627732,0.5081884090422659 | 0.18610141784627732,0.5081884090422658
0.45818688673789265,0.5082792408801786 | 0.45818688673789265,0.5082792408801785
0.13347997241594378,0.5085994020210153 | 0.13347997241594378,0.5085994020210152
0.7279393069737652,0.5082743139146337 | 0.7279393069737652,0.5082743139146336
0.43685070261517955,0.5082054932289445 | 0.43685070261517955,0.5082054932289444
0.7655128989911097,0.5084780190581778 | 0.7655128989911097,0.5084780190581777
0.6102251494776413,0.5085067071667805 | 0.6102251494776413,0.5085067071667804
0.7513750860290457,0.5082242252400639 0.7513750860290457,0.5082242252400639
0.956614621083458,0.5086273010565618 0.956614621083458,0.5086273010565618
0.05705472115125432,0.5083753342014574 | 0.05705472115125432,0.5083753342014573