0

I'm migrating from Matlab... Using Jupyter Lab in Windows 10.

Let's say I have a time series, with an array of datetime64 and some other data

t = np.arange('2010-09-01','2010-09-02', dtype='datetime64[6h]')

d = np.linspace(0,1,len(t))

I want to merge both to save an continue working in another notebook (I know there are other ways to do this!). First I transform them in columns arrays

t_col = t.reshape(-1,1)
d_col = d.reshape(-1,1)

and merge

m = np.c_[t_col, d_col]

and get

TypeError                                 Traceback (most recent call last)
<ipython-input-28-5bbb5e23249f> in <module>
----> 1 m = np.c_[t_col, d_col]

c:\Python37_32\lib\site-packages\numpy\lib\index_tricks.py in __getitem__(self, key)
    333                 objs[k] = objs[k].astype(final_dtype)
    334 
--> 335         res = self.concatenate(tuple(objs), axis=axis)
    336 
    337         if matrix:

TypeError: invalid type promotion

but if I first covert the datetime64 to datetime

t_col2 =t_col.astype('datetime64[h]').tolist()

m = np.c_[t_col2, d_col]

it works.

Question: Why I can't merge the arrays when the data&time is datetime64? Why do I need to convert it to datetime?

  • sidenote: for what it seems to me you want to do, why not use pandas straight away? Here, it would just be e.g. `df = pd.DataFrame({'t':t.astype('datetime64[ns]'), 'd':d})` – FObersteiner Oct 03 '19 at 18:01
  • `np.c_`, a version of `np.concatenate`, makes a new array. In `numpy` arrays have to have a consistent `dtype`. That's the same as the MATLAB matrix. A MATLAB cell can contain diverse items (strings, floats etc). Python lists can have diverse objects, so can `object` dtype arrays. But the fast numeric calculations don't work with mixed elements. – hpaulj Oct 03 '19 at 18:03

1 Answers1

1
In [266]: t = np.arange('2010-09-01','2010-09-02', dtype='datetime64[6h]') 
     ...:  
     ...: d = np.linspace(0,1,len(t))                                           
In [267]: t                                                                     
Out[267]: 
array(['2010-09-01T00', '2010-09-01T06', '2010-09-01T12', '2010-09-01T18'],
      dtype='datetime64[6h]')
In [268]: d                                                                     
Out[268]: array([0.        , 0.33333333, 0.66666667, 1.        ])

Using vstack instead of c_ (just for convenience):

In [269]: np.vstack((t,d))                                                      
...
<__array_function__ internals> in concatenate(*args, **kwargs)

TypeError: invalid type promotion

The error arises because the result needs to be one dtype, either float or datetime64. numpy arrays require a uniform dtype (like a MATLAB matrix).

With tolist or astype(object) the datetime64 array is turned into datatime objects. These can be concatenated with floats, also converted to objects:

In [270]: np.vstack((t.tolist(),d))                                             
Out[270]: 
array([[datetime.datetime(2010, 9, 1, 0, 0),
        datetime.datetime(2010, 9, 1, 6, 0),
        datetime.datetime(2010, 9, 1, 12, 0),
        datetime.datetime(2010, 9, 1, 18, 0)],
       [0.0, 0.3333333333333333, 0.6666666666666666, 1.0]], dtype=object)

In [271]: np.vstack((t.astype(object),d))                                       
Out[271]: 
array([[datetime.datetime(2010, 9, 1, 0, 0),
        datetime.datetime(2010, 9, 1, 6, 0),
        datetime.datetime(2010, 9, 1, 12, 0),
        datetime.datetime(2010, 9, 1, 18, 0)],
       [0.0, 0.3333333333333333, 0.6666666666666666, 1.0]], dtype=object)

This object dtype array is like a MATLAB cell, containing diverse elements.

Another option is to make structured array (sort of like a MATLAB struct):

In [274]: arr  = np.zeros(4, dtype=[('t',t.dtype), ('d',d.dtype)])              
In [275]: arr                                                                   
Out[275]: 
array([('1970-01-01T00', 0.), ('1970-01-01T00', 0.),
       ('1970-01-01T00', 0.), ('1970-01-01T00', 0.)],
      dtype=[('t', '<M8[6h]'), ('d', '<f8')])
In [276]: arr['t']=t; arr['d']=d                                                
In [277]: arr                                                                   
Out[277]: 
array([('2010-09-01T00', 0.        ), ('2010-09-01T06', 0.33333333),
       ('2010-09-01T12', 0.66666667), ('2010-09-01T18', 1.        )],
      dtype=[('t', '<M8[6h]'), ('d', '<f8')])

edit


Another way to construct the structured array from these 2 arrays:

In [286]: import numpy.lib.recfunctions as rf     
In [293]: rf.merge_arrays((t,d))                                                
Out[293]: 
array([('2010-09-01T00', 0.        ), ('2010-09-01T06', 0.33333333),
       ('2010-09-01T12', 0.66666667), ('2010-09-01T18', 1.        )],
      dtype=[('f0', '<M8[6h]'), ('f1', '<f8')])

Internally it's similar to what I first demonstrated.

hpaulj
  • 221,503
  • 14
  • 230
  • 353