Why is the mean smaller than the minimum and why does this change with 64bit floats?

Question

I have an input array, which is a masked array.
When I check the mean, I get a nonsensical number: less than the reported minimum value!

So, raw array: numpy.mean(A) < numpy.min(A). Note A.dtype returns float32.

FIX: A3=A.astype(float). A3 is still a masked array, but now the mean lies between the minimum and the maximum, so I have some faith it's correct! Now for some reason A3.dtype is float64. Why?? Why did that change it, and why is it correct at 64 bit and wildly incorrect at 32 bit?

Can anyone shed any light on why I needed to recast the array to accurately calculate the mean? (with or without numpy, it turns out).

EDIT: I'm using a 64-bit system, so yes, that's why recasting changed it to 64bit. It turns out I didn't have this problem if I subsetted the data (extracting from netCDF input using netCDF4 Dataset), smaller arrays did not produce this problem - therefore it's caused by overflow, so switching to 64-bit prevented the problem.
So I'm still not clear on why it would have initially loaded as float32, but I guess it aims to conserve space even if it is a 64-bit system. The array itself is 1872x128x256, with non-masked values around 300, which it turns out is enough to cause overflow :)

Please show an actual code example demonstrating the problem. — BrenBarn, Apr 10 '14 at 06:48
`numpy` arrays are completely different from Python arrays, I assume you mean the former? — Matthew Trevor, Apr 10 '14 at 06:51
If you are on a 64-bit system, `A.astype(float)` will return a `np.float64` array. — ebarr, Apr 10 '14 at 06:52
I eventually figured it out, will edit post. Didn't add code or array type because I wanted to keep it generic and not bring NetCDF into it :) — user2701830, Apr 11 '14 at 01:41
And @ebarr you're right, the fact that the system was 64-bit was the key, it forced it to go to the preferred precision, not the minimum required. — user2701830, Apr 11 '14 at 01:50

score 0 · Answer 1 · answered Apr 11 '14 at 01:49

0

If you're working with large arrays, be aware of potential overflow problems!!
Changing from 32-bit to 64-bit floats in this instance avoids an (unflagged as far as I can tell) overflow that lead to the anomalous mean calculation.

answered Apr 11 '14 at 01:49

user2701830

21
2

1

It's rather loss of precision than overflow. Check: http://stackoverflow.com/a/3236263/1639922 – mehmetminanc Apr 12 '14 at 14:00

Why is the mean smaller than the minimum and why does this change with 64bit floats?

1 Answers1