1

Python 2.7:

On an attempt to:

add a column (arr_date) with a datetype64(D) from 1 dimension Numpy array to an existing multidimentional Numpy array (data)

The following errors are raised:

  1. 'TypeError: invalid type promotion'
  2. 'numpy.AxisError: axis 1 is out of bounds for array of dimension 1'

Created column, that is desired to be appended:

>> arr_date
<<     
[['2019-04-21']
 ['2019-04-21']
 ['2019-04-21']]

Tried to create a datetime object out of the 3 columns provided in the source (data) in a new Numpy array (arr_date) and add it to the old array (data) using methods below:

  1. np.c_
  2. np.append
  3. np.hstack
  4. np.column_stack
  5. np.concatenate

data = [(2019, 4, 21, 4.9, -16.5447, -177.1961,  22.4, 'US')
(2019, 4, 21, 4.8,  -9.5526,  109.6003,  10. , 'UK')
(2019, 4, 21, 4.6,  -7.2737,  124.0192, 554.9, 'FR')]

arr_date = np.zeros((len(data),1), dtype='datetime64[D]')

i = 0

while i < len(data):    
        date = dt.date(data [i][0], data[i][1], data[i][2])     
        arr_date[i][0] = date    
        i += 1    


test1 = np.column_stack((data,arr_date))

np.c_[data, np.zeros(len(data))]

test2 = np.concatenate(data.reshape(-1,1), arr_date.reshape(-1,1), axis=1)

np.append(data, arr_date, axis = 1)

np.stack((data, arr_date), axis=-1)

np.hstack((data, arr_date))

test3 = np.column_stack((data, arr_date))
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
wounky
  • 97
  • 1
  • 12
  • Those functions all use `np.concatenate`, which means the inputs have to have compatible dtypes, and compatible shapes. If one fails, it's likely the others will too, especially if it's a dtype problem. – hpaulj Apr 30 '19 at 23:19
  • What's `data`. It looks like a list of tuples, except it's missing commas between tuples. Is it a structured array? What's the shape, (3,)? What's the `dtype`. – hpaulj Apr 30 '19 at 23:23
  • Was `data` produced by loading a `csv` with something like `genfromtxt`? – hpaulj Apr 30 '19 at 23:56
  • yes, data was produced by loading a csv and using module genfromtxt. – wounky May 01 '19 at 11:57

1 Answers1

1

Until you answer my question about data.dtype, I'm going to add commas and make data a list of tuples:

In [117]: data = [(2019, 4, 21, 4.9, -16.5447, -177.1961,  22.4, 'US'), 
     ...: (2019, 4, 21, 4.8,  -9.5526,  109.6003,  10. , 'UK'), 
     ...: (2019, 4, 21, 4.6,  -7.2737,  124.0192, 554.9, 'FR')]                      

In [118]: arr_date = np.zeros((len(data),1), dtype='datetime64[D]') 
     ...:  
     ...: i = 0 
     ...:  
     ...: while i < len(data):     
     ...:         date = dt.date(data [i][0], data[i][1], data[i][2])      
     ...:         arr_date[i][0] = date     
     ...:         i += 1     
     ...:                                                                            

In [119]: arr_date                                                                   
Out[119]: 
array([['2019-04-21'],
       ['2019-04-21'],
       ['2019-04-21']], dtype='datetime64[D]')

So arr_date is a (3,1) array with datetime64[D] dtype.

===

I'm guessing that your data is actually a structured array, with a compound dtype. For example:

In [121]: data1 = np.array(data, dtype='i,i,i,f,f,f,f,U2')                           

In [122]: data1                                                                      
Out[122]: 
array([(2019, 4, 21, 4.9, -16.5447, -177.1961,  22.4, 'US'),
       (2019, 4, 21, 4.8,  -9.5526,  109.6003,  10. , 'UK'),
       (2019, 4, 21, 4.6,  -7.2737,  124.0192, 554.9, 'FR')],
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<f4'), ('f7', '<U2')])

In [123]: data1.shape                                                                
Out[123]: (3,)

In [124]: data1.dtype                                                                
Out[124]: dtype([('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<f4'), ('f7', '<U2')])

Your date iteration works with this. But the fields (not columns) of data1 can be accessed by name:

In [127]: data1['f0']                                                                
Out[127]: array([2019, 2019, 2019], dtype=int32)

column_stack can join a (3,) array with a (3,1) to produce a (3,2), but:

In [130]: np.column_stack((data, arr_date))                                          
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-130-5c8e6a103474> in <module>
----> 1 np.column_stack((data, arr_date))

/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py in column_stack(tup)
    638             arr = array(arr, copy=False, subok=True, ndmin=2).T
    639         arrays.append(arr)
--> 640     return _nx.concatenate(arrays, 1)

TypeError: invalid type promotion

First note that the error occurs when trying to do concatenate. I bet all the other random tries produced a similar error (if they got past the axis error). The error is telling us that it can't combine a compound dtype as in Out[124] with the datetime64 dtype of arr_date. The dtypes don't match, and can't be made to match.

Basically this isn't a concatenation problem. You are not trying to add a 'column' to a 2d array, or even trying to create a 2d array. data is not 2d. It is 1d. What you need to do is add a field to a structured array.

There is a module of functions that make it easier to work structured arrays.

In [131]: import numpy.lib.recfunctions as rf 

append_fields should do the trick, but, it can be a bit tricky to use:

In [137]: rf.append_fields(data1, 'date', arr_date.ravel(), usemask=False)           
Out[137]: 
array([(2019, 4, 21, 4.9, -16.5447, -177.1961,  22.4, 'US', '2019-04-21'),
       (2019, 4, 21, 4.8,  -9.5526,  109.6003,  10. , 'UK', '2019-04-21'),
       (2019, 4, 21, 4.6,  -7.2737,  124.0192, 554.9, 'FR', '2019-04-21')],
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<f4'), ('f7', '<U2'), ('date', '<M8[D]')])

This is still a 1d array, but with one more field, which I called date.

===

In my answer to:

Add and access object-type field of a numpy structured array

I show how to construct a new structured array with fields from two arrays, which gives an idea of what append_fields is doing.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks a lot for the explanation, it helped greatly. Technical page on append_fields was very useful as well! – wounky May 01 '19 at 12:00