12

I have a numpy matrix X, and I tried to change the datatype of column 1 using the code below:

X[:, 1].astype('str')
print(type(X[0, 1]))

but I got the following result:

<type 'numpy.float64'>

Anyone know why the type was not changed to str ? And what is a correct way to change the column type of X?Thanks!

Edamame
  • 23,718
  • 73
  • 186
  • 320
  • 1
    ndarray.astype does not perform in-place. And you can not change the type of one column of an array. If you want have an array with mixed type, you should use [structured type](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.rec.html) – Syrtis Major Apr 09 '16 at 19:17

4 Answers4

7

Providing a simple example will explain it better.

>>> a = np.array([[1,2,3],[4,5,6]])
array([[1, 2, 3],
       [4, 5, 6]])
>>> a[:,1]
array([2, 5])
>>> a[:,1].astype('str') # This generates copy and then cast.
array(['2', '5'], dtype='<U21')
>>> a                    # So the original array did not change.
array([[1, 2, 3],
       [4, 5, 6]])
Hun
  • 3,707
  • 2
  • 15
  • 15
  • 12
    This explains why it doesn't work but it doesn't explain what to do instead. In my case I have a column of strings (object type from numpy's point of view) and a function to map those strings to integers that I would like to use to transform the string column into an integer column. – Joseph Garvin Aug 10 '19 at 21:52
3

Let me answer the second question since I have met the same problem.

As dinarkino mentioned, just assign the type back won't work.

>>> X = np.array([[1.1,2.2],[3.3,4.4]])
>>> print(X[:,1].dtype)
<class 'numpy.float64'>

>>> X[:,1] = X[:,1].astype('str')
>>> print(X[:,1].dtype)
<class 'numpy.float64'>

So my approach is to assign the dtypes of the whole matrix to 'object' first, then assign the str datatype back.

>>> X = X.astype('object')
>>> print(type(X[0,1]))
<class 'float'>

>>> X[:,1] = X[:,1].astype('str')
>>> print(type(X[0,1]))
<class 'str'>
ChrisQIU
  • 306
  • 2
  • 6
1

More clear and straightforward answer. The type was not changed to str because NumPy array should have only one data type. The correct way to change the column type of X would be to use structured arrays or one of the solutions from this question.

I had the same problem, and I didn't want to use structured arrays. A possible option is to use pandas if it suits your task. If you're going to change just one column, possibly it means that your data is tabular. Then you can easily change the data type of column. Another compromise is to make a copy of the column and use it separately from the original array.

>>> x = np.ones((3, 3), dtype=np.float)
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
>>> x[:, 1] = x[:, 1].astype(np.int)
>>> type(x[:, 1][0])
numpy.float64
>>> x_pd = pd.DataFrame(x)
>>> x_pd[1] = x_pd[1].astype(np.int16)
>>> type(x_pd[1][0])
numpy.int16
dinarkino
  • 167
  • 1
  • 9
0

When I was facing the same issue I used this quick one line workaround

>>> X = np.array([[1,2],[3,4],[5,6]])
    [[1 2]
     [3 4]
     [5 6]]

>>> X_1 = np.array([[x,str(y)] for x,y in X],dtype='O')
   [[1 '2']
    [3 '4']
    [5 '6']]

Probably a bit overcomplicated but works. :)

Lie Wanx
  • 11
  • 2