How can I separate np.datetime64 information into different columns?

Question

I'm trying to create a date array whose colums are: day, month and year between two specific dates (time delta = 1 day). I know this must be very simple to do, but I couldn't find a similar example.

I've found that I can create date arrays using numpy datetime64 as follows:

import numpy as np
dates = np.arange(np.datetime64('2010-01-01'),np.datetime64('2014-12-31'))

Although this produces the list of dates I need, I can't find in the documentation/google how to split them into columns in a numpy array.

Any ideas?

Thanks in advance!

hpaulj · Accepted Answer · 2017-07-13T21:59:10.750

In [25]: dates = np.arange(np.datetime64('2010-01-01'),np.datetime64('2014-12-31'))
In [26]: dates
Out[26]: 
array(['2010-01-01', '2010-01-02', '2010-01-03', ..., '2014-12-28',
       '2014-12-29', '2014-12-30'], dtype='datetime64[D]')
In [27]: dates.shape
Out[27]: (1825,)

Like a regular np.arange (e.g. np.arange(1825)), this creates a 1d array spanning a range of values. To be clear the array contains numeric values (floats) that represent the date value. The ndarray code uses the specificed dtype to interpret those numbers. When displaying the array it renders the data as these data strings (as dictated by the [D] part of the dtype).

What sort of columns do you need?

As with any 1d array, you can create a volume vector with:

In [28]: dates[:,None]
Out[28]: 
array([['2010-01-01'],
       ['2010-01-02'],
       ['2010-01-03'],
       ..., 
       ['2014-12-28'],
       ['2014-12-29'],
       ['2014-12-30']], dtype='datetime64[D]')

alist = dates.tolist()
In [59]: alist[:10]
Out[59]: 
[datetime.date(2010, 1, 1),
 datetime.date(2010, 1, 2),
 datetime.date(2010, 1, 3),
 datetime.date(2010, 1, 4),
 datetime.date(2010, 1, 5),
 datetime.date(2010, 1, 6),
 datetime.date(2010, 1, 7),
 datetime.date(2010, 1, 8),
 datetime.date(2010, 1, 9),
 datetime.date(2010, 1, 10)]

produces a list of datetime objects. We extract year, month, day from those objects iteratively.

In [66]: np.array([[x.year, x.month, x.day] for x in alist])
Out[66]: 
array([[2010,    1,    1],
       [2010,    1,    2],
       [2010,    1,    3],
       ..., 
       [2014,   12,   28],
       [2014,   12,   29],
       [2014,   12,   30]])

np.datetime64 dtype doesn't have the exact equivalent. We can convert them into various units, and take differences.

In [68]: yrs = dates.astype('datetime64[Y]')
In [69]: yrs
Out[69]: array(['2010', '2010', '2010', ..., '2014', '2014', '2014'], dtype='datetime64[Y]')
In [70]: mths = dates.astype('datetime64[M]')-yrs
In [71]: mths
Out[71]: array([ 0,  0,  0, ..., 11, 11, 11], dtype='timedelta64[M]')
In [72]: days = dates - dates.astype('datetime64[M]')
In [73]: days
Out[73]: array([ 0,  1,  2, ..., 27, 28, 29], dtype='timedelta64[D]')

With different dtypes we can't directly concatenate these. But convert them to ints we can get the same 2d array:

In [76]: np.stack((yrs.astype(int), mths.astype(int), days.astype(int)),axis=1)
Out[76]: 
array([[40,  0,  0],
       [40,  0,  1],
       [40,  0,  2],
       ..., 
       [44, 11, 27],
       [44, 11, 28],
       [44, 11, 29]])

(year needs an offset).

In [77]: np.stack((yrs.astype(int)+1970, mths.astype(int), days.astype(int)),axis=1)

Or a structured array 'stacking':

In [78]: np.rec.fromarrays([yrs, mths, days])
Out[78]: 
rec.array([('2010',  0,  0), ('2010',  0,  1), ('2010',  0,  2), ...,
 ('2014', 11, 27), ('2014', 11, 28), ('2014', 11, 29)], 
          dtype=[('f0', '<M8[Y]'), ('f1', '<m8[M]'), ('f2', '<m8[D]')])

I think the OP is looking for " day, month and year " columns IIUC. — DSM, Jul 13 '17 at 20:00
That's my guess too. But if so, it should be clearly stated in the question. — hpaulj, Jul 13 '17 at 20:11
Splitting these dates isn't as trivial as it might seem. I can't find an explicit `get_month` method or function for such a dtype. We either have to work with a list of `datetime` objects, or play games with the `datetime64` units. — hpaulj, Jul 13 '17 at 20:26

score 2 · Answer 2 · answered Jul 13 '17 at 21:53

If it doesn't have to be in a Numpy array you could use Pandas --

import numpy as np
from pandas import DatetimeIndex
dates = DatetimeIndex(np.arange(np.datetime64('2010-01-01'),np.datetime64('2014-12-31')))

date_cols = [[d.year, d.month, d.day] for d in dates]

Otherwise, I would avoid numpy and do the entire thing with datetime --

from datetime import datetime, timedelta

st = datetime(2010, 1, 1) # or datetime.strptime("2010-01-01", "%Y-%m-%d")
ed = datetime(2014, 12, 31) # or datetime.strptime("2014-12-31", "%Y-%m-%d")

count =  ed - st
date_cols = []
for d in range(count.days+1):
    date = st + timedelta(days=d)
    date_cols.append([date.year, date.month, date.day])
print(date_cols)

How can I separate np.datetime64 information into different columns?

2 Answers2