3

I wanted to create an array to hold mixed types - string and int.

The following code did not work as desired - all elements got typed as String.

>>> a=numpy.array(["Str",1,2,3,4])
>>> print a
['Str' '1' '2' '3' '4']
>>> print type(a[0]),type(a[1])
<type 'numpy.string_'> <type 'numpy.string_'>

All elements of the array were typed as 'numpy.string_'

But, oddly enough, if I pass one of the elements as "None", the types turn out as desired:

>>> a=numpy.array(["Str",None,2,3,4])
>>> print a
['Str' None 2 3 4]
>>> print type(a[0]),type(a[1]),type(a[2])
<type 'str'> <type 'NoneType'> <type 'int'>

Thus, including a "None" element provides me with a workaround, but I am wondering why this should be the case. Even if I don't pass one of the elements as None, shouldn't the elements be typed as they are passed?

Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • 2
    both not really duplicates, second one is better, but a more explicit explanation with regards to `None` would be better for OP – timgeb Jul 03 '18 at 08:29
  • The proposed duplicate explains just the string dtype: https://stackoverflow.com/questions/49751000/how-does-numpy-determin-the-object-arrays-dtype-and-what-it-means. – hpaulj Jul 03 '18 at 14:49

2 Answers2

2

Mixed types in NumPy is strongly discouraged. You lose the benefits of vectorised computations. In this instance:

  • For your first array, NumPy makes the decision to convert your array to a uniform array of strings of 3 or less characters.
  • For your second array, None is not permitted as a "stringable" variable in NumPy, so NumPy uses the standard object dtype. object dtype represents a collection of pointers to arbitrary types.

You can see this when you print the dtype attributes of your arrays:

print(np.array(["Str",1,2,3,4]).dtype)     # <U3
print(np.array(["Str",None,2,3,4]).dtype)  # object

This should be entirely expected. NumPy has a strong preference for homogenous types, as indeed you should have for any meaningful computations. Otherwise, Python list may be a more appropriate data structure.

For a more detailed descriptions of how NumPy prioritises dtype choice, see:

jpp
  • 159,742
  • 34
  • 281
  • 339
1

An alternative to adding the None is to make the dtype explicit:

In [80]: np.array(["str",1,2,3,4])
Out[80]: array(['str', '1', '2', '3', '4'], dtype='<U3')
In [81]: np.array(["str",1,2,3,4], dtype=object)
Out[81]: array(['str', 1, 2, 3, 4], dtype=object)

Creating a object dtype array and filling it from a list is another option:

In [85]: res = np.empty(5, object)
In [86]: res
Out[86]: array([None, None, None, None, None], dtype=object)
In [87]: res[:] = ['str', 1, 2, 3, 4]
In [88]: res
Out[88]: array(['str', 1, 2, 3, 4], dtype=object)

Here it isn't needed, but it matters when you want an array of lists.

hpaulj
  • 221,503
  • 14
  • 230
  • 353