What is the difference between dtype= and .astype() in numpy?

Question

Context: I would like to use numpy ndarrays with float32 instead of float64.

Edit: Additional context - I'm concerned about how numpy is executing these calls because they will be happening repeatedly as part of a backpropagation routine in a neural net. I'd like the net to carry out all addition/subtraction/multiplication/division in float32 for validation purposes, as I want to compare results with another group's work. It seems like initialization for methods like randn will always go from float64 -> float32 with .astype() casting. Once my ndarray is of type float32 if i use np.dot for example will those multiplications happen in float32? How can I verify?

The documentation is not clear to me - http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html

I figured out I can just add .astype('float32') to the end of a numpy call, for example, np.random.randn(y, 1).astype('float32').

I also see that dtype=np.float32 is an option, for example, np.zeros(5, dtype=np.float32). However, trying np.random.randn((y, 1), dtype=np.float32) returns the following error:

    b = np.random.randn((3,1), dtype=np.float32)
TypeError: randn() got an unexpected keyword argument 'dtype'

What is the difference between declaring the type as float32 using dtype and using .astype()?

Both b = np.zeros(5, dtype=np.float32) and b = np.zeros(5).astype('float32') when evaluated with:

print(type(b))
print(b[0])
print(type(b[0]))

prints:

[ 0.  0.  0.  0.  0.]
<class 'numpy.ndarray'>
0.0
<class 'numpy.float32'>

If the function accepts the `dtype` parameter then use it. If it doesn't accept that parameter you'll have to use the `astype`. The effect should be the same (in most cases). The function that accepts `dtype` might be using `astype` (or equivalent) in its return expression. — hpaulj, Sep 21 '16 at 18:18
Thanks @hpaulj, is `astype()` more widely accepted than `dtype`? Second, do you know if `astype()` forces a temporary `float64` array first, before copying and casting to whatever type the user provides? Is there a way to get `float32` `ndarray's` without creating a temp array first? — phoenixdown, Sep 21 '16 at 18:51

score 13 · Accepted Answer · edited Jun 27 '20 at 01:25

Let's see if I can address some of the confusion I'm seeing in the comments.

Make an array:

In [609]: x=np.arange(5)
In [610]: x
Out[610]: array([0, 1, 2, 3, 4])
In [611]: x.dtype
Out[611]: dtype('int32')

The default for arange is to make an int32.

astype is an array method; it can used on any array:

In [612]: x.astype(np.float32)
Out[612]: array([ 0.,  1.,  2.,  3.,  4.], dtype=float32)

arange also takes a dtype parameter

In [614]: np.arange(5, dtype=np.float32)
Out[614]: array([ 0.,  1.,  2.,  3.,  4.], dtype=float32)

whether it created the int array first and converted it, or made the float32 directly isn't any concern to me. This is a basic operation, done in compiled code.

I can also give it a float stop value, in which case it will give me a float array - the default float type.

In [615]: np.arange(5.0)
Out[615]: array([ 0.,  1.,  2.,  3.,  4.])
In [616]: _.dtype
Out[616]: dtype('float64')

zeros is similar; the default dtype is float64, but with a parameter I can change that. Since its primary task with to allocate memory, and it doesn't have to do any calculation, I'm sure it creates the desired dtype right away, without further conversion. But again, this is compiled code, and I shouldn't have to worry about what it is doing under the covers.

In [618]: np.zeros(5)
Out[618]: array([ 0.,  0.,  0.,  0.,  0.])
In [619]: _.dtype
Out[619]: dtype('float64')
In [620]: np.zeros(5,dtype=np.float32)
Out[620]: array([ 0.,  0.,  0.,  0.,  0.], dtype=float32)

randn involves a lot of calculation, and evidently it is compiled to work with the default float type. It does not take a dtype. But since the result is an array, it can be cast with astype.

In [623]: np.random.randn(3)
Out[623]: array([-0.64520949,  0.21554705,  2.16722514])
In [624]: _.dtype
Out[624]: dtype('float64')
In [625]: __.astype(np.float32)
Out[625]: array([-0.64520949,  0.21554704,  2.16722512], dtype=float32)

Let me stress that astype is a method of an array. It takes the values of the array and produces a new array with the desire dtype. It does not act retroactively (or in-place) on the array itself, or on the function that created that array.

The effect of astype is often (always?) the same as a dtype parameter, but the sequence of actions is different.

In https://stackoverflow.com/a/39625960/901925 I describe a sparse matrix creator that takes a dtype parameter, and implements it with an astype method call at the end.

When you do calculations such as dot or *, it tries to match the output dtype with inputs. In the case of mixed types it goes with the higher precision alternative.

In [642]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float64)
Out[642]: array([  0.,   1.,   4.,   9.,  16.])
In [643]: _.dtype
Out[643]: dtype('float64')
In [644]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float32)
Out[644]: array([  0.,   1.,   4.,   9.,  16.], dtype=float32)

There are casting rules. One way to look those up is with can_cast function:

In [649]: np.can_cast(np.float64,np.float32)
Out[649]: False
In [650]: np.can_cast(np.float32,np.float64)
Out[650]: True

It is possible in some calculations that it will cast the 32 to 64, do the calculation, and then cast back to 32. The purpose would be to avoid rounding errors. But I don't know how you find that out from the documentation or tests.

Thanks - I'm concerned about how `numpy` is executing these calls because they will be happening repeatedly as part of a backpropagation routine in a neural net. I'd like the net to carry out all addition/subtraction/multiplication/division in `float32` for validation purposes, as I want to compare results with another group's work. It seems like initialization for methods like `randn` will always go from `float64` -> `float32` with `.astype()` casting. Once my `ndarray` is of type `float32` if i use `np.dot` for example will those multiplications happen in `float32`? How can I verify? — phoenixdown, Sep 21 '16 at 21:09
The documentation is not clear to me - http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html — phoenixdown, Sep 21 '16 at 21:09
`np.dot` has compiled `numpy` code that calls routines in some external libraries (BLAS, ATLAS etc). I think most of those have both float and double versions, and it choose the version that is most compatible with the inputs. But you'd have to dig into the `numpy` source code to be sure. — hpaulj, Sep 21 '16 at 21:28
Okay thanks, that's super helpful. Looks like an email on the dev list-serv or examining some source code may be necessary to figure out if it's implicitly casting to 64 for higher-precision math before casting back to 32 to match output and input dtypes. — phoenixdown, Sep 21 '16 at 21:48
@bobo Old question, but if you're concerned about computation time, the correct answer is to just run `timeit` on both and see which is faster. — endolith, Jun 27 '20 at 01:26

score 2 · Answer 2 · answered Oct 17 '18 at 04:11

arr1 = np.array([25, 56, 12, 85, 34, 75])    
arr2 = np.array([42, 3, 86, 32, 856, 46])

arr1.astype(np.complex)
print (arr1)
print(type(arr1[0]))
print(arr1.astype(np.complex))
arr2 = np.array(arr2,dtype='complex')
print(arr2)
print(type(arr2[0]))

OUTPUT for above

[25 56 12 85 34 75]
<class 'numpy.int64'>
[25.+0.j 56.+0.j 12.+0.j 85.+0.j 34.+0.j 75.+0.j]
[ 42.+0.j   3.+0.j  86.+0.j  32.+0.j 856.+0.j  46.+0.j]
<class 'numpy.complex128'>

It can be seen that astype changes the type temporally as we do in normal type casting but where as the generic method changes the type permanently

This answer is incorrect. `.astype()` does not mutate the original array, it returns a new array. So, if you use `arr1.astype(np.complex)` and then try to `print(arr1)`, you are printing the original array, not the new array with a new dtype. — Johiasburg Frowell, May 02 '22 at 12:50

score 0 · Answer 3 · answered Sep 21 '16 at 17:58

0

.astype() copies the data.

>>> a = np.ones(3, dtype=float)
>>> a
array([ 1.,  1.,  1.])
>>> b = a.astype(int)
>>> b
array([1, 1, 1])
>>> np.may_share_memory(a, b)
False

Note that astype() copies the data even if the dtype is actually the same:

>>> c = a.astype(float)
>>> np.may_share_memory(a, c)
False

answered Sep 21 '16 at 17:58

ev-br

24,968
9
65
78

What does this mean for the inline case where an `ndarray` is declared using `.astype()` ? Is something extra being copied here? – phoenixdown Sep 21 '16 at 18:03
You mean `np.ones(3, dtype=float).astype(int)`? Yes, it first creates a temporary array of floats, then copies it and casts values to ints, and then discards the temporary. – ev-br Sep 21 '16 at 18:07
No, i mean this: `np.ones(3).astype('float32')` - does this first create a temporary array of `float64` entries, then copy a casted `float32` version of the array to a different location in memory and destroy the original? Or does it just create a `float32` array to begin with? – phoenixdown Sep 21 '16 at 18:12
Yes, it copies. `.astype` is being called on a temporary. – ev-br Sep 21 '16 at 18:13
That's weird...how can i do `np.random.randn(3,1).astype('float32')` without creating a temporary array first? Per the question, trying `np.random.randn((3,1), dtype=np.float32)` does not work. – phoenixdown Sep 21 '16 at 18:49
You can't. `random.randn` does not have a dtype argument in any released version of numpy. – ev-br Sep 21 '16 at 18:57
Thanks - to clarify, are you saying that any method of creating an `ndarray` in `float32` using `.astype()` results in the following three steps: (1) create the array in `float64` (2) copy a version of the array casted to `float32` to a new location in memory (3) discard the `float64` array ? My question is if we can just directly create a `float32` array using .astype(). Thanks – phoenixdown Sep 21 '16 at 20:00
`astype` accepts a `copy=False` parameter. See the docs. – hpaulj Mar 02 '21 at 18:03

What is the difference between dtype= and .astype() in numpy?

3 Answers3

OUTPUT for above