Same Python code, same data, different results on different machines

Question

I have a very strange problem that I get different results on the same code and same data on different machines.

I have a python code based on numpy/scipy/sklearn and I use anaconda as my base python distribution. Even when I copy the entire project directory (which includes all the data and code) from my main machine to another machine and run it, the results I get are different. Specifically, I'm doing a classification task and I get 3 percent difference in accuracy. I am using the same version of python and anaconda on the two machines. My main machine is ubuntu 16.04 and the results on it are lower than several other machines with various OS on which I tried (OSX, ubuntu 14.04 and Centos). So, there should be something wrong with my current system configuration because all other machines show consistent results. Since the version of my anaconda is consistent among all machines, I have no idea what else could be the problem. Any ideas what else I should check or what could be the source of the problem?

I also removed and reinstalled anaconda from scratch but it didn't help.

Are you using a train/test split? If so, it could be due to pulling different samples. — ZSH, Jul 06 '16 at 16:15
I'm using random seed for that. Also I don't get different results every time I run, I get different results only on one of the machines. So, I guess it must be due to system configuration. But not sure what to check. — CentAu, Jul 06 '16 at 16:21
Without a look at the code (and preferably some representative sample data) it's hard to say for sure. The same code/data/libraries ***should*** produce the same result, regardless of OS. Might there be a package version difference where something was changed under the hood between versions? — ZSH, Jul 06 '16 at 16:27
@zhespelt The strange thing is that the package versions are also consistent. Just installed anaconda from scratch on my machine and another one and tested. Still get different results. — CentAu, Jul 06 '16 at 16:33
Are all the machines the same bit architecture (e.g. 64bit)? — Aguy, Jul 06 '16 at 18:28
Maybe one version of Numpy is linking with the Math Kernel Library and one isn't? Just trying to add a data point. — Cody Piersall, Jul 06 '16 at 20:35
You could maybe check out numpy [sysinfo](https://github.com/numpy/numpy/blob/master/numpy/distutils/system_info.py) for each of the builds. — Cody Piersall, Jul 06 '16 at 20:37
I think it is a long shot but can you delete .pyc files and run again ? — Hani, Jul 06 '16 at 20:43
@CodyPiersall I also think that this should somehow relate to underlying math library. However, I printed out info on `mkl` by `sysinfo.get_info('mkl')` on the both machines and they seem identical. Are there any other kernel library that `numpy/scipy` are using? — CentAu, Jul 06 '16 at 21:31

score 3 · Accepted Answer · answered Mar 14 '18 at 09:32

3

I had a similar problem and I found this discussion. May be the problem is that MKL(Intel Math Kernel Library) float point operations are non-deterministic by default. So export MKL_CBWR=AUTO may solve the problem.

answered Mar 14 '18 at 09:32

dim

992
11
26

Hi dim, how can I set the MKL_CBWR? – Luca Monno Sep 03 '21 at 14:01
It can be set also in linux? – Luca Monno Sep 03 '21 at 14:12
@LucaMonno, yes you can set on Linux like any other environment variable with `export` or `~/.bash_profile` – dim Sep 10 '21 at 15:34

ev-br · Answer 2 · 2016-07-06T20:58:15.173

2

If your code uses linear algebra, check it. Generally, roundoff errors are not deterministic, and if you have badly conditioned matrices, it can be it.

edited Jul 06 '16 at 20:58

answered Jul 06 '16 at 20:29

ev-br

24,968
9
65
78

1

Can you elaborate on badly conditioned matrices? – CentAu Jul 06 '16 at 21:16
I'd add that when debugging these sorts of issues I would try to do a binary search comparing intermediate results to find the place in the algorithms where things start to differ. Good luck! – ev-br Jul 06 '16 at 23:09

Same Python code, same data, different results on different machines

2 Answers2

Linked