3

In the course of tracking down a related problem I stumbled upon the fact that np.std seems to be returning different values depending on whether the axis keyword argument was specified or the corresponding masking was done manually. Consider the following snippet:

import numpy as np

np.random.seed(123)

a = np.empty(shape=(100, 2), dtype=float)
a[:, 0] = np.random.uniform()
a[:, 1] = np.random.uniform()

print(np.std(a, axis=0)[0] == np.std(a[:, 0]))  # Should be the same.
print(np.std(a, axis=0)[1] == np.std(a[:, 1]))  # Should be the same.

However the two computations don't return the same result. Further inspection reveals:

>>> print('axis=0: {:e} vs {:e}'.format(np.std(a, axis=0)[0], np.std(a[:, 0])))
axis=0: 7.771561e-16 vs 2.220446e-16
>>> print('axis=1: {:e} vs {:e}'.format(np.std(a, axis=0)[1], np.std(a[:, 1])))
axis=1: 4.440892e-16 vs 0.000000e+00

I don't see why the two ways of computation would return different results since formally they describe the same procedure (masking the axis manually or letting numpy do the job by specifying axis shouldn't make a difference).


I am using Python 3.5.2 and numpy 1.15.0.

a_guest
  • 34,165
  • 12
  • 64
  • 118
  • It gets more confusing when I passed the values to Excel to double check, and the numbers are even more different than what numpy proposes. I even tried this: https://stackoverflow.com/questions/34133939/is-there-any-difference-between-numpy-std-and-excel-stdev-function – Adib Aug 14 '18 at 17:52

1 Answers1

1

These numbers, as you may have noticed, are quite small. So small, in fact, that neither is particularly accurate. Notably, minor differences in implementation will in fact result in different answers do to the inaccuracy of floating point numbers. In numpy's implementation of std, which is in C, performs the axis computation differently than done explicitly here.

Of course, the 'real' standard deviation of this data along the column is of course 0.

modesitt
  • 7,052
  • 2
  • 34
  • 64
  • *"[...] performs the axis computation differently than done explicitly here."* What would be another way to "compute" the axis? Actually it's not a computation of the axis but it's a *specification* and this specification is completely unambiguous. Again I don't see why the computation of the std. dev. would be different whether I perform it on a 1D array or on a 2D array while specifying the axis. The involved operations should be exactly the same (per axis). If you claim this is due to implementation details then please provide a reference and point to the corresponding source code. – a_guest Aug 14 '18 at 18:20