2

I have an input array, which is a masked array.
When I check the mean, I get a nonsensical number: less than the reported minimum value!

So, raw array: numpy.mean(A) < numpy.min(A). Note A.dtype returns float32.

FIX: A3=A.astype(float). A3 is still a masked array, but now the mean lies between the minimum and the maximum, so I have some faith it's correct! Now for some reason A3.dtype is float64. Why?? Why did that change it, and why is it correct at 64 bit and wildly incorrect at 32 bit?

Can anyone shed any light on why I needed to recast the array to accurately calculate the mean? (with or without numpy, it turns out).

EDIT: I'm using a 64-bit system, so yes, that's why recasting changed it to 64bit. It turns out I didn't have this problem if I subsetted the data (extracting from netCDF input using netCDF4 Dataset), smaller arrays did not produce this problem - therefore it's caused by overflow, so switching to 64-bit prevented the problem.
So I'm still not clear on why it would have initially loaded as float32, but I guess it aims to conserve space even if it is a 64-bit system. The array itself is 1872x128x256, with non-masked values around 300, which it turns out is enough to cause overflow :)

1 Answers1

0

If you're working with large arrays, be aware of potential overflow problems!!
Changing from 32-bit to 64-bit floats in this instance avoids an (unflagged as far as I can tell) overflow that lead to the anomalous mean calculation.