10

I have a very strange problem that I get different results on the same code and same data on different machines.

I have a python code based on numpy/scipy/sklearn and I use anaconda as my base python distribution. Even when I copy the entire project directory (which includes all the data and code) from my main machine to another machine and run it, the results I get are different. Specifically, I'm doing a classification task and I get 3 percent difference in accuracy. I am using the same version of python and anaconda on the two machines. My main machine is ubuntu 16.04 and the results on it are lower than several other machines with various OS on which I tried (OSX, ubuntu 14.04 and Centos). So, there should be something wrong with my current system configuration because all other machines show consistent results. Since the version of my anaconda is consistent among all machines, I have no idea what else could be the problem. Any ideas what else I should check or what could be the source of the problem?

I also removed and reinstalled anaconda from scratch but it didn't help.

CentAu
  • 10,660
  • 15
  • 59
  • 85
  • Are you using a train/test split? If so, it could be due to pulling different samples. – ZSH Jul 06 '16 at 16:15
  • I'm using random seed for that. Also I don't get different results every time I run, I get different results only on one of the machines. So, I guess it must be due to system configuration. But not sure what to check. – CentAu Jul 06 '16 at 16:21
  • Without a look at the code (and preferably some representative sample data) it's hard to say for sure. The same code/data/libraries ***should*** produce the same result, regardless of OS. Might there be a package version difference where something was changed under the hood between versions? – ZSH Jul 06 '16 at 16:27
  • @zhespelt The strange thing is that the package versions are also consistent. Just installed anaconda from scratch on my machine and another one and tested. Still get different results. – CentAu Jul 06 '16 at 16:33
  • 1
    Are all the machines the same bit architecture (e.g. 64bit)? – Aguy Jul 06 '16 at 18:28
  • @Theguy They are all 64bit – CentAu Jul 06 '16 at 19:11
  • 1
    Maybe one version of Numpy is linking with the Math Kernel Library and one isn't? Just trying to add a data point. – Cody Piersall Jul 06 '16 at 20:35
  • You could maybe check out numpy [sysinfo](https://github.com/numpy/numpy/blob/master/numpy/distutils/system_info.py) for each of the builds. – Cody Piersall Jul 06 '16 at 20:37
  • I think it is a long shot but can you delete .pyc files and run again ? – Hani Jul 06 '16 at 20:43
  • @Hani deleting pyc files did not help. – CentAu Jul 06 '16 at 21:29
  • @CodyPiersall I also think that this should somehow relate to underlying math library. However, I printed out info on `mkl` by `sysinfo.get_info('mkl')` on the both machines and they seem identical. Are there any other kernel library that `numpy/scipy` are using? – CentAu Jul 06 '16 at 21:31
  • Related: https://stackoverflow.com/q/76500465/1271772 – Nike Jun 18 '23 at 14:56

2 Answers2

3

I had a similar problem and I found this discussion. May be the problem is that MKL(Intel Math Kernel Library) float point operations are non-deterministic by default. So export MKL_CBWR=AUTO may solve the problem.

dim
  • 992
  • 11
  • 26
2

If your code uses linear algebra, check it. Generally, roundoff errors are not deterministic, and if you have badly conditioned matrices, it can be it.

ev-br
  • 24,968
  • 9
  • 65
  • 78
  • 1
    Can you elaborate on badly conditioned matrices? – CentAu Jul 06 '16 at 21:16
  • I'd add that when debugging these sorts of issues I would try to do a binary search comparing intermediate results to find the place in the algorithms where things start to differ. Good luck! – ev-br Jul 06 '16 at 23:09