5

I'm encountering a problem with incorrect numpy calculations when the inputs to a calculation are a numpy array with a 32-bit integer data type, but the outputs include larger numbers that require 64-bit representation.

Here's a minimal working example:

arr = np.ones(5, dtype=int) * (2**24 + 300)  # arr.dtype defaults to 'int32'

# Following comment from @hpaulj I changed the first line, which was originally:
# arr = np.zeros(5, dtype=int) 
# arr[:] = 2**24 + 300

single_value_calc = 2**8 * (2**24 + 300)
numpy_calc = 2**8 * arr

print(single_value_calc)
print(numpy_calc[0])

# RESULTS
4295044096
76800

The desired output is that the numpy array contains the correct value of 4295044096, which requires 64 bits to represent it. i.e. I would have expected numpy arrays to automatically upcast from int32 to int64 when the output requires it, rather maintaining a 32-bit output and wrapping back to 0 after the value of 2^32 is exceeded.

Of course, I can fix the problem manually by forcing int64 representation:

numpy_calc2 = 2**8 * arr.astype('int64')

but this is undesirable for general code, since the output will only need 64-bit representation (i.e. to hold large numbers) in some cases and not all. In my use case, performance is critical so forcing upcasting every time would be costly.

Is this the intended behaviour of numpy arrays? And if so, is there a clean, performant solution please?

SLhark
  • 177
  • 10
  • It's the `arr[:]=` assignment that forcing the conversionn, You can't change the dtype of an array that way. Look at what happens if you assign a float to tgat array. – hpaulj Oct 24 '19 at 18:54
  • Thanks for the comment, although that line isn't where the conversion should take place. It's the "numpy_calc = ..." line where I want a 64-bit output. For example, I can remove the [:] operation and the problem is still there (see revised post). – SLhark Oct 24 '19 at 19:07

1 Answers1

3

Type casting and promotion in numpy is fairly complicated and occasionally surprising. This recent unofficial write-up by Sebastian Berg explains some of the nuances of the subject (mostly concentrating on scalars and 0d arrays).

Quoting from this document:

Python Integers and Floats

Note that python integers are handled exactly like numpy ones. They are, however, special in that they do not have a dtype associated with them explicitly. Value based logic, as described here, seems useful for python integers and floats to allow:

arr = np.arange(10, dtype=np.int8)
arr += 1
# or:
res = arr + 1
res.dtype == np.int8

which ensures that no upcast (for example with higher memory usage) occurs.

(emphasis mine.)

See also Allan Haldane's gist suggesting C-style type coercion, linked from the previous document:

Currently, when two dtypes are involved in a binary operation numpy's principle is that "the output dtype's range covers the range of both input dtypes", and when a single dtype is involved there is never any cast.

(emphasis again mine.)

So my understanding is that the promotion rules for numpy scalars and arrays differ, primarily because it's not feasible to check every element inside an array to determine whether casting can be done safely. Again from the former document:

Scalar based rules

Unlike arrays, where inspection of all values is not feasable, for scalars (and 0-D arrays) the value is inspected.

This would mean that you can either use np.int64 from the start to be safe (and if you're on linux then dtype=int will actually do this on its own), or check the maximum value of your arrays before suspect operations and determine if you have to promote the dtype yourself, on a case-by-case basis. I understand that this might not be feasible if you are doing a lot of calculations, but I don't believe there is a way around this considering numpy's current type promotion rules.

Community
  • 1
  • 1
  • 1
    Thanks for the detailed explanation - this does make sense. It would of course be a nice feature of a numpy array to offer an "enable casting / checking flag" that does check if such casting is required even for scalar multiplication, but I accept this behaviour is 'not wanted' more often than it is 'wanted'. – SLhark Oct 25 '19 at 18:45
  • @SLhark indeed. For what it's worth the linked documents came up exactly because changes to promotion behaviour were suggested, although I suspect that the philosophy of not checking array elements is fundamental enough to stay even if some changes are made eventually. – Andras Deak -- Слава Україні Oct 25 '19 at 18:48