0

This is different from other question such as this Store different datatypes in one NumPy array?. I could not see the matrix in that 'possible duplicate question'. The question did not look like a matrix.

I am interested in changing the specific data types of within a matrix.

I would like to have a matrix with a mix of datatypes. That is would like to change the individual columns datatypes: I will illustrate with the original matrix and what I would like to do. The original matrix is of type float64.

mymatrix
array([[17.        , 27.        , 19.62120627, 21.        ,  0.        ],
       [10.        ,  1.        , 18.94042755,  0.        ,  0.        ],
       [11.        , 53.        , 13.96885424, 29.        ,  0.        ],
       [ 8.        ,  1.        , 19.36688898,  0.        ,  1.        ],
       [ 8.        , 44.        , 19.26500703, 29.        ,  1.        ],
       [16.        ,  2.        , 27.31823044,  0.        ,  1.        ]])

But I would like the individual columns to be of different types: i2,i2,f8,i2,? That is the columns in the matrix are of type int16, int16, float64, int16 and binary.

See reference half way down https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html. Also the suggested answer Store different datatypes in one NumPy array? does not relate to my need.

For example: I have the first row in the original matrix

array([[17.        , 27.        , 19.62120627, 21.        ,  0.        ]

but I would like

array([[17,27,19.62120627,21, False]

That is my columns to respectively be of type: i2, i2, f8, i2, ? That is I would like my whole matrix as shown above to have columns of type int16, int16, float64, int16 and binary.

Recall, the reference is half way down See reference half way down https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html

Sum: how to change individual columns to specific datatypes?

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
  • 1
    Possible duplicate of [Store different datatypes in one NumPy array?](https://stackoverflow.com/questions/11309739/store-different-datatypes-in-one-numpy-array) – RishiG Apr 23 '18 at 22:48
  • Sorry RishiG, that question did not look like a matrix. I could not follow it. The array was 1D, not 2D as in my example. Even with the 1D example it involved tuples. Tuples are not the same as arrays especially when slicing. The above question bears no relation to the structure described in the 'possible duplicate'. Thank you, Anthony of Sydney – Anthony from Sydney Apr 23 '18 at 23:13

1 Answers1

0

Python changing individual data types within a matrix/array - eg int, float, binary

This is better than the question I asked. It addresses whether one can have different kinds of data in matrix.

You can but not directly as a matrix.

Originally the data was a list with integer, floating and boolean types as in the following list:

mylist 
[[17, 27, 19.6212062712054, 21, False], 
[10, 1, 18.940427553737198, 0, False], 
[10, 17, 19.123083111577685, 6, False], 
[7, 5, 22.943202316685845, 0, False], 
[15, 40, 14.983843150392211, 29, False],
[11, 53, 13.968854243956049, 29, False], 
[8, 1, 19.366888983233444, 0, True],
[8, 44, 19.265007030047215, 29, True], 
[7, 7, 23.485663475594826, 0, True],
[16, 16, 25.42215007204769, 0, True], 
[3, 21, 10.787963908414609, 22, False]]

Suppose I wanted to select the first column or last column such as these two examples:

[17, 10, 10, 7, 15, 11, 8, 8, 7, 16, 3]

AND

[False, False, False, False, False, False, True, True, True, True, False]

One cannot do this with a list doing the respective operations:

mylist[0]

OR

mylist[4]

If you converted a mylist to myarray you would get a uniform conversion:

import numpy as np
myarray = np.array(mylist)
>>> myarray
array([[17.        , 27.        , 19.62120627, 21.        ,  0.        ],
       [10.        ,  1.        , 18.94042755,  0.        ,  0.        ],
       [10.        , 17.        , 19.12308311,  6.        ,  0.        ],
       [ 7.        ,  5.        , 22.94320232,  0.        ,  0.        ],
       [15.        , 40.        , 14.98384315, 29.        ,  0.        ],
       [11.        , 53.        , 13.96885424, 29.        ,  0.        ],
       [ 8.        ,  1.        , 19.36688898,  0.        ,  1.        ],
       [ 8.        , 44.        , 19.26500703, 29.        ,  1.        ],
       [ 7.        ,  7.        , 23.48566348,  0.        ,  1.        ],
       [16.        , 16.        , 25.42215007,  0.        ,  1.        ],
       [ 3.        , 21.        , 10.78796391, 22.        ,  0.        ]])

Observe that all integers are converted to floats, and all booleans are converted to floating ones and zeros.

One could extract the columns

>>> myarray[:,0]
array([17., 10., 10.,  7., 15., 11.,  8.,  8.,  7., 16.,  3.])
>>> myarray[:,4]
array([0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0.])

Furthermore, you would get quirky output if you tried to convert a list to an array by individually setting the columns:

myarray = np.array(mylist, dtype='i2,i2,f8,i2,?')

Reference on dtype = https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html To illustrate, while there is NO ERROR message. BUT I do get this inexplicable output Here is an extract:

myarray
array([[(17, 17, 17.        , 17,  True),
        (27, 27, 27.        , 27,  True),
        (19, 19, 19.62120627, 19,  True),
        (21, 21, 21.        , 21,  True),
        ( 0,  0,  0.        ,  0, False)],
       [(10, 10, 10.        , 10,  True),
        ( 1,  1,  1.        ,  1,  True),
        (18, 18, 18.94042755, 18,  True),
        ( 0,  0,  0.        ,  0, False),
        ( 0,  0,  0.        ,  0, False)],
       [(10, 10, 10.        , 10,  True),  etc

However, I still wanted the variables of different dtypes, but handle it like an array.

Inspired by an idea at TypeError: 'zip' object is not subscriptable I was able to extract 'column like' but not exactly like obtaining columns from a matrix, I can get columns from a list.

mylist2 = list(zip(*mylist))
>>> mylist2[0]; #Get the first column
(17, 10, 10, 7, 15, 11, 8, 8, 7, 16, 3)
>>> mylist2[4]; #Get the fifth column
(False, False, False, False, False, False, True, True, True, True, False)

Of course, if it was using the array, the array's specific columns would be accessed as myarray[:,0] and myarray[:,4] instead of the respective mylist2[0] and mylist2[4].

If any function or procedure required an array instead of a list, then the selected columns can be turned into arrays by:

firstcolarray = np.array(mylist2[0])
fifthcolarray = np.array(mylist2[4])

>>> firstcolarray
array([17, 10, 10,  7, 15, 11,  8,  8,  7, 16,  3])
>>> fifthcolarray
array([False, False, False, False, False, False,  True,  True,  True,
        True, False])

Conclusion: while you cannot have different data types in a matrix, you can use different data types in list but by listing a zipped *list. Accessing particular columns are achieved by using [col no -1] instead of [:, col no -1] as jt would be if the data structure was a matrix/array.

While the resultant extractions of particular 'columns' produces a list/tuple, if a function requires an array rather than a list/tuple, it is a matter of converting the list/tuple to an array.

Thank you,

Anthony from sunny Sydney