2

So I have always created numpy arrays like that:

>>> u = np.zeros( 10, int )
>>> v = np.zeros( 10, float )

I have always been oblivious about maximum permitted values until now. I have been assuming that it would simply work. If it didn't, I would get OverflowError, and then I would find some workaround like taking the logarithm.

But recently I started to use the other dtypes:

>>> v8 = np.zeros( 10, np.uint8 )
>>> v8[0] = 2 ** 8 - 1
>>> v8[1] = 2 ** 8
>>> v8
>>> array([255,   0,   0,   0,   0,   0,   0,   0,   0,   0], dtype=uint8)

Ok so I don't get any warning when I assign a value bigger than 255. That's a bit scary.

So my questions are:

  • when I used arrays with types int and float, is it possible that I set a value that was too big (resulting in completely wrong calculations) without knowing it?
  • if I want to use uint8, do I have to manually check all assigned values are in [ 0, 255 ]?
usual me
  • 8,338
  • 10
  • 52
  • 95

4 Answers4

1

numpy works very deep at the machine level. Tests are time consuming and so, testing is left to the developer. Python is much more high-level and many test are done automatically or, in the case of ints, ints can have arbitrary large values. Everywhere you have to decide between speed and security. numpy is farther on the speed side.

In situations, where it is necessary to test the range of values, you have to check it by yourself.

The clip-method may help you:

>>> u = np.array([124,-130, 213])
>>> u.astype('b')
array([124, 126, -43], dtype=int8)
>>> u.clip(-128,127).astype('b')
array([ 124, -128,  127], dtype=int8)
Daniel
  • 42,087
  • 4
  • 55
  • 81
1

As explained in the other answers, too large values get 'wrapped around', so you need to clip them by hand to the minimum and maximum allowed values before converting. For integers, these limits can be obtained using np.iinfo. You could write your own utility function to do this conversion in a safe way for a given dtype:

def safe_convert(x, new_dtype):
    info = np.iinfo(new_dtype)
    return x.clip(info.min, info.max).astype(new_dtype)

Quick test:

In [31]: safe_convert(np.array([-1,0,1,254,255,256]), np.uint8)
Out[31]: array([  0,   0,   1, 254, 255, 255], dtype=uint8)

In [32]: safe_convert(np.array([-129,-128,-127,126,127,128]), np.int8)
Out[32]: array([-128, -128, -127,  126,  127,  127], dtype=int8)
Bas Swinckels
  • 18,095
  • 3
  • 45
  • 62
0

Yes, uint8 will mask your values (take the 8 lsb), so you need to manually check it:

>>> a = numpy.uint8(256)
>>> a
0

And yes, overflow can occur without you realizing it. It's a common source of error in many programming languages. However, long integers in python behave in an uncommon way: They have no explicitly defined limit.

I've written about it in this answer.

Community
  • 1
  • 1
keyser
  • 18,829
  • 16
  • 59
  • 101
0

As already explained, numpy wraps around to avoid doing checks.

If clipping is not acceptable, before you cast, you can use numpy.min_scalar_type to get the minimum dtype that will hold your data without loosing data.

Also note that practically the only reason to use uint8 is to save memory in very big arrays, as the computation speed is usually roughly the same (in some operations will be internally casted upwards, even). If your arrays are not too big so that the memory is not a big concern, you should be safer and use uint16 or even uint32 for intermediate computations. If memory is your problem, you should consider moving to out of core storage, like PyTables; if you are now about to fill the memory, maybe with a bigger dataset not even uint8 will be enough.

Davidmh
  • 3,797
  • 18
  • 35