Numpy sum function returns 1.67772e+07

Question

I have two big (432*136*136*46) 'numpy.ndarray' H1 and H2 which encompass altitude values corresponding to two simulations. I want to generate an array with 1 when H1 and H2 have the same altitude and 0 when they don't. Then, I want to know how many elements I selected, so I want to calculate the sum of the elements of this matrix. Here is my code :

H1=np.concatenate([np.around(files1[i].hrtm()[:,0:45,:,:]/h) for i in range(0,files1.__len__())])
H2=np.concatenate([np.around(files2[i].hrtm()[:,0:45,:,:]/h) for i in range(0,files2.__len__())])

diff=np.absolute(H1-H2)
diff[diff==0.]=np.float64(-1.)
diff[diff!=-1]=np.float64(0.)
diff=diff*diff

print np.sum(diff)

And here is my output, which is always the same and does not depend on the data:

1.67772e+07

After some research, I read that it's related to the maximum size of a float. I tried several formats, replacing np.float64 by int, float, np.float32, or nothing, and they all give the same results.

Do you have an idea of how I could do to have the real number ?

It's related to the **precision** of a single-precision float: above the value 2^24, adding 1 does not have any effect because it is absorbed. — Pascal Cuoq, Jun 19 '15 at 16:48
@rth: That's a strange assertion. `diff == 0.` will give an array containing `True`s in the positions corresponding to zeros, and `False`s elsewhere. It'll work just as the OP intends. — Mark Dickinson, Jun 19 '15 at 16:55
@MarkDickinson Note, "if `diff` is a float array" in my comment. I missed the fact that it was converted to integer with `np.around`. Otherwise you can end up with rounding issues like checking for `2.2 * 3.0 == 6.6` which evaluates to `False` in python. — rth, Jun 19 '15 at 17:23
@rth: Yes, of course you have to be careful with rounding issues. But to say that `diff == 0.` won't match zeros is simply false, even if `diff` has a floating-point type. (Which, incidentally, it does here, since `np.around` returns something of floating-point dtype.) `0.0 == 0.0` is `True`! — Mark Dickinson, Jun 19 '15 at 17:29
True. I was talking on the general principle: testing if `float1==float2` is just looking for trouble, and even if it might work in 95% of cases. — rth, Jun 19 '15 at 17:35

Daniel · Accepted Answer · 2015-06-19T16:59:17.843

2

The type of your diff-array is the type of H1 and H2. Since you are only adding many 1s you can convert diff to bool:

print diff.astype(bool).sum()

or much more simple

print (H1 == H2).sum()

But since floating point values are not exact, one might test for very small differences:

print (abs(H1 - H2) < 1e-30).sum()

edited Jun 19 '15 at 16:59

answered Jun 19 '15 at 16:48

Daniel

42,087
4
55
81

That works perfectly fine, thank you very much, and those solutions are much more elegant. – Arnaud BUBBLE Jun 19 '15 at 16:55
1

Note that the values of `H1` and `H2` have already been rounded to the nearest integers, so using a tolerance doesn't achieve anything here. (And if these are altitudes in feet, say, then `1e-30` is going to be way too small to be useful: the comparison will be essentially comparing for equality.) – Mark Dickinson Jun 19 '15 at 16:58
@MarkDickinson: So I overlooked the rounding, still numerical tolerances has nothing to do with physical tolerances. – Daniel Jun 19 '15 at 17:02
1

The problem with using a tolerance is selecting a useful value for that tolerance. `1e-30` is going to be hopelessly small (from a numerical point of view, not a physical one): single-precision floats have around 7 decimal digits of precision, so any two unequal floats larger than `1e-22` are going to differ by more than `1e-30`. That makes the tolerance test equivalent to an equality test for values larger than `1e-22`. And given that they're altitudes, I think we can safely assume that those numbers *are* larger than `1e-22`. – Mark Dickinson Jun 19 '15 at 17:05

Numpy sum function returns 1.67772e+07

1 Answers1