85

I have an array of datetime64 type:

dates = np.datetime64(['2010-10-17', '2011-05-13', "2012-01-15"])

Is there a better way than looping through each element just to get np.array of years:

years = f(dates)
#output:
array([2010, 2011, 2012], dtype=int8) #or dtype = string

I'm using stable numpy version 1.6.2.

James King
  • 6,229
  • 3
  • 25
  • 40
enedene
  • 3,525
  • 6
  • 34
  • 41

13 Answers13

86

I find the following tricks give between 2x and 4x speed increase versus the pandas method described in this answer (i.e. pd.DatetimeIndex(dates).year etc.). The speed of [dt.year for dt in dates.astype(object)] I find to be similar to the pandas method. Also these tricks can be applied directly to ndarrays of any shape (2D, 3D etc.)

dates = np.arange(np.datetime64('2000-01-01'), np.datetime64('2010-01-01'))
years = dates.astype('datetime64[Y]').astype(int) + 1970
months = dates.astype('datetime64[M]').astype(int) % 12 + 1
days = dates - dates.astype('datetime64[M]') + 1
Anon
  • 861
  • 6
  • 2
  • 4
    This is a good solution. It'd be really nice if there was something simple like this in numpy. – naught101 Feb 16 '17 at 23:51
  • 12
    Thanks for actually giving an answer, instead of saying "you shouldn't be using , use instead". – Luke Davis Jun 09 '17 at 23:04
  • This works, and actually works for dates that `datetime` can't handle: `d = np.datetime64('-2000003-10-01').astype('datetime64[Y]').astype(int) + 1970` yields `-2000003` – Jody Klymak Aug 14 '19 at 00:44
  • 3
    To get integers instead of `timedelta64[D]` in the example for `days` above, use: `(dates - dates.astype('datetime64[M]')).astype(int) + 1` – flexatone May 09 '20 at 14:02
  • A short update, because of the version updates, use the following line instead `years = dates.values.astype('datetime64[Y]')` or `years = dates.values.astype('datetime64[Y]').astype(int) + 1970` – Dmitry Borisoglebsky Dec 14 '21 at 08:19
  • Probably the easiest way: Given `date = np.datetime64("2000-01-01")`, simply `date.astype(str).split('-')`. Now you have `[year_string, month_string, date_string]`. If you want int, just `int(month_string)`. – user8491363 May 11 '22 at 05:51
  • 2
    In response to @flexatone, your suggestion didn't work for me, but this did: `(dates.astype('datetime64[D]') - dates.astype('datetime64[M]')).astype(int) + 1` – Rens Nov 21 '22 at 13:26
  • Just going to mention that I had trouble getting this to work because I was mistakenly using "dates. **view** ('datetime64[Y]').astype(int) + 1970" instead of "**astype**", which returned nonsense numbers. – Mandias Mar 03 '23 at 00:17
51

As datetime is not stable in numpy I would use pandas for this:

In [52]: import pandas as pd

In [53]: dates = pd.DatetimeIndex(['2010-10-17', '2011-05-13', "2012-01-15"])

In [54]: dates.year
Out[54]: array([2010, 2011, 2012], dtype=int32)

Pandas uses numpy datetime internally, but seems to avoid the shortages, that numpy has up to now.

bmu
  • 35,119
  • 13
  • 91
  • 108
  • 7
    This is giving me wrong results for month with numpy 1.7.1 and pandas 0.12.0. However, `Series(dates).apply(lambda x: x.month)` seems to work. – dmvianna Oct 29 '13 at 02:58
  • No problem here with the same versions. If you really get wrong results you should open a pandas issue. – bmu Oct 29 '13 at 18:54
  • 1
    Oh, I actually used `pd.DatetimeIndex(np.datetime64(['2010-10-17', '2011-05-13', "2012-01-15"]))` – dmvianna Oct 29 '13 at 22:06
  • Use `np.datetime_as_string` to convert Datetime64 objects to strings which Pandas can parse. – sebix Sep 11 '14 at 08:59
  • @sebix: why? Pandas understands Datetime64. – naught101 Jun 20 '16 at 00:45
  • @naught101 I wrote this comment two years ago, possibly something has changed in the meantime? – sebix Jun 22 '16 at 18:15
  • When the input is a datetime64 scalar, rather than an array, `pd.Timestamp(str(np.datetime_as_string(myDatetime64)))` seems to be needed, though converting to Pandas may be overkill in that case. – AstroFloyd Apr 08 '21 at 15:46
21

There should be an easier way to do this, but, depending on what you're trying to do, the best route might be to convert to a regular Python datetime object:

datetime64Obj = np.datetime64('2002-07-04T02:55:41-0700')
print datetime64Obj.astype(object).year
# 2002
print datetime64Obj.astype(object).day
# 4

Based on comments below, this seems to only work in Python 2.7.x and Python 3.6+

Nick
  • 3,172
  • 3
  • 37
  • 49
  • 1
    And you can do this on a whole array using list operations [as outlined by @acjay](http://stackoverflow.com/a/13654502/1304462): `[dt.year for dt in dtime64Array.astype(object)]` – Nick Feb 09 '16 at 00:26
  • This code works, but if I give it a different np.datetime64 (a date from my DataFrame) it evaluates to long instead to datetime... even if I explicitly use astype(datetime.datetime) it evaluates to long... weird... – Mr.WorshipMe Nov 10 '16 at 14:40
  • @Mr.WorshipMe unsure about that. Might be worth writing up a more detailed version showing dual behavior example. Then submitting that as a new question with a link back here. – Nick Nov 10 '16 at 16:09
  • 11
    This does not work in python 3.5 - `AttributeError: 'int' object has no attribute 'year'`. I'm also not sure why is should have worked in 2.7, why does `.astype(object)` convert to a `datetime.datetime`? – naught101 Feb 17 '17 at 00:04
  • 1
    I have just tested it in python-3.6.3 and it works: `import numpy as np; print(np.datetime64('2002-07-04T02:55:41-0700').astype(object).year)` – S.V Apr 04 '19 at 19:10
  • Excellent, though the `.tolist()` syntax is simpler than `.astype(object)` to convert any numpy array or even scalar into a native python object. See separate answer below. – Mahé Feb 07 '20 at 11:47
  • I find this method to be faster than the high-voted answer. – L. Francis Cong Jun 20 '22 at 15:31
  • 3
    It seems that the datetime must be of type `datetime[D]`, not `datetime[ns]`. For example, in Python 3.8.12, `np.datetime64('2001-01-01').astype(object)` gives a `datetime` object, but `np.datetime64('2001-01-01').astype('datetime64[ns]').astype(object)` gives a `long`, where replacing `object` with `datetime` gives the same result. – L. Francis Cong Jun 20 '22 at 18:02
14

This is how I do it.

import numpy as np

def dt2cal(dt):
    """
    Convert array of datetime64 to a calendar array of year, month, day, hour,
    minute, seconds, microsecond with these quantites indexed on the last axis.

    Parameters
    ----------
    dt : datetime64 array (...)
        numpy.ndarray of datetimes of arbitrary shape

    Returns
    -------
    cal : uint32 array (..., 7)
        calendar array with last axis representing year, month, day, hour,
        minute, second, microsecond
    """

    # allocate output 
    out = np.empty(dt.shape + (7,), dtype="u4")
    # decompose calendar floors
    Y, M, D, h, m, s = [dt.astype(f"M8[{x}]") for x in "YMDhms"]
    out[..., 0] = Y + 1970 # Gregorian Year
    out[..., 1] = (M - Y) + 1 # month
    out[..., 2] = (D - M) + 1 # dat
    out[..., 3] = (dt - D).astype("m8[h]") # hour
    out[..., 4] = (dt - h).astype("m8[m]") # minute
    out[..., 5] = (dt - m).astype("m8[s]") # second
    out[..., 6] = (dt - s).astype("m8[us]") # microsecond
    return out

It's vectorized across arbitrary input dimensions, it's fast, its intuitive, it works on numpy v1.15.4, it doesn't use pandas.

I really wish numpy supported this functionality, it's required all the time in application development. I always get super nervous when I have to roll my own stuff like this, I always feel like I'm missing an edge case.

RBF06
  • 2,013
  • 2
  • 21
  • 20
  • 2
    Excellent function. Might honestly be worth submitting a PR on the numpy github page. – Luke Davis Jun 05 '20 at 18:38
  • 1
    I like the low-level solution here, clean and simple. – Chang Jun 09 '20 at 23:14
  • Why don't you simply continue the same pattern: `out[..., 3] = h - D; out[..., 4] = m - h; out[..., 5] = s - m...`, why doing an extra `.astype` when you already have it appropriately converted? – panda-34 Jun 19 '23 at 04:20
10

Using numpy version 1.10.4 and pandas version 0.17.1,

dates = np.array(['2010-10-17', '2011-05-13', '2012-01-15'], dtype=np.datetime64)
pd.to_datetime(dates).year

I get what you're looking for:

array([2010, 2011, 2012], dtype=int32)
Steve Schulist
  • 931
  • 1
  • 11
  • 18
5

Use dates.tolist() to convert to native datetime objects, then simply access year. Example:

>>> dates = np.array(['2010-10-17', '2011-05-13', '2012-01-15'], dtype='datetime64')
>>> [x.year for x in dates.tolist()]
[2010, 2011, 2012]

This is basically the same idea exposed in https://stackoverflow.com/a/35281829/2192272, but using simpler syntax.

Tested with python 3.6 / numpy 1.18.

EDIT: sometimes it is necessary to convert to "datetime64[D]", e g. when the array is of type "datetime64[ns]". Replace dates.tolist() above with dates.astype("datetime64[D]").tolist()

Mahé
  • 445
  • 4
  • 9
2

Another possibility is:

np.datetime64(dates,'Y') - returns - numpy.datetime64('2010')

or

np.datetime64(dates,'Y').astype(int)+1970 - returns - 2010

but works only on scalar values, won't take array

Mark
  • 934
  • 1
  • 10
  • 25
1

If you upgrade to numpy 1.7 (where datetime is still labeled as experimental) the following should work.

dates/np.timedelta64(1,'Y')
MarianD
  • 13,096
  • 12
  • 42
  • 54
Daniel
  • 19,179
  • 7
  • 60
  • 74
  • 2
    Note that as of 1.9 this method does not work. The divide is meant to convert the time span to the floating point number of years. It does not extract they year attribute of a date. – jdr5ca Dec 28 '16 at 01:51
1

Anon's answer works great for me, but I just need to modify the statement for days

from:

days = dates - dates.astype('datetime64[M]') + 1

to:

days = dates.astype('datetime64[D]') - dates.astype('datetime64[M]') + 1
Donald Duck
  • 8,409
  • 22
  • 75
  • 99
1

This is obviously quite late, but I benefitted from one of the answers, so sharing my bit here.

The answer by Anon is quite right- the speed is incredibly higher using numpy method instead of first casting them as pandas datetime series and then getting dates. Albeit the offsetting and conversion of results after numpy transformations are bit shabby, a cleaner helper for this can be written, like so:-

def from_numpy_datetime_extract(date: np.datetime64, extract_attribute: str = None):
    _YEAR_OFFSET = 1970
    _MONTH_OFFSET = 1
    _MONTH_FACTOR = 12
    _DAY_FACTOR = 24*60*60*1e9
    _DAY_OFFSET = 1

    if extract_attribute == 'year':
        return date.astype('datetime64[Y]').astype(int) + _YEAR_OFFSET
    elif extract_attribute == 'month':
        return date.astype('datetime64[M]').astype(int)%_MONTH_FACTOR + _MONTH_OFFSET
    elif extract_attribute == 'day':
        return ((date - date.astype('datetime64[M]'))/_DAY_FACTOR).astype(int) + _DAY_OFFSET
    else:
        raise ValueError("extract_attribute should be either of 'year', 'month' or 'day'")

Solving the ask dates = np.array(['2010-10-17', '2011-05-13', "2012-01-15"], dtype = 'datetime64'):-

  • Numpy method (using the helper above)
%timeit -r10 -n1000 [from_numpy_datetime_extract(x, "year") for x in dates]
# 14.3 µs ± 4.03 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)
  • Pandas method
%timeit -r10 -n1000 pd.to_datetime(dates).year.tolist()
# 304 µs ± 32.2 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)
Dharman
  • 30,962
  • 25
  • 85
  • 135
verivid
  • 11
  • 1
0

There's no direct way to do it yet, unfortunately, but there are a couple indirect ways:

[dt.year for dt in dates.astype(object)]

or

[datetime.datetime.strptime(repr(d), "%Y-%m-%d %H:%M:%S").year for d in dates]

both inspired by the examples here.

Both of these work for me on Numpy 1.6.1. You may need to be a bit more careful with the second one, since the repr() for the datetime64 might have a fraction part after a decimal point.

acjay
  • 34,571
  • 6
  • 57
  • 100
  • This doesn't work in python 3.5, numpy 1.11, for the same reason as my comment on [Ncik's answer](http://stackoverflow.com/a/35281829/210945) – naught101 Feb 17 '17 at 00:08
0

convert np.datetime64 to float-year

In this solution, you can see, step-by-step, how to process np.datetime64 datatypes.

In the following dt64 is of type np.datetime64 (or even a numpy.ndarray of that type):

  • year = dt64.astype('M8[Y]') contains just the year. If you want a float array of that, do 1970 + year.astype(float).
  • the days (since January 1st) you can access by days = (dt64 - year).astype('timedelta64[D]')
  • You can also deduce if a year is a leap year or not (compare days_of_year)

See also the numpy tutorial

import numpy as np
import pandas as pd

def dt64_to_float(dt64):
    """Converts numpy.datetime64 to year as float.

    Rounded to days

    Parameters
    ----------
    dt64 : np.datetime64 or np.ndarray(dtype='datetime64[X]')
        date data

    Returns
    -------
    float or np.ndarray(dtype=float)
        Year in floating point representation
    """

    year = dt64.astype('M8[Y]')
    # print('year:', year)
    days = (dt64 - year).astype('timedelta64[D]')
    # print('days:', days)
    year_next = year + np.timedelta64(1, 'Y')
    # print('year_next:', year_next)
    days_of_year = (year_next.astype('M8[D]') - year.astype('M8[D]')
                    ).astype('timedelta64[D]')
    # print('days_of_year:', days_of_year)
    dt_float = 1970 + year.astype(float) + days / (days_of_year)
    # print('dt_float:', dt_float)
    return dt_float

if __name__ == "__main__":

    dt_str = '2011-11-11'
    dt64 = np.datetime64(dt_str)
    print(dt_str, 'as float:', dt64_to_float(dt64))
    print()

    dates = np.array([
        '1970-01-01', '2014-01-01', '2020-12-31', '2019-12-31', '2010-04-28'],
        dtype='datetime64[D]')
    float_dates = dt64_to_float(dates)


    print('dates:      ', dates)
    print('float_dates:', float_dates)

output

2011-11-11 as float: 2011.8602739726027

dates:       ['1970-01-01' '2014-01-01' '2020-12-31' '2019-12-31' '2010-04-28']
float_dates: [1970.         2014.         2020.99726776 2019.99726027 2010.32054795]
Markus Dutschke
  • 9,341
  • 4
  • 63
  • 58
0

How about simply converting to string?

Probably the easiest way:

import numpy as np

date = np.datetime64("2000-01-01")
date_strings = date.astype(str).split('-'). 
# >> ['2000', '01', '01']

year_int = int(date_strings[0])
user8491363
  • 2,924
  • 5
  • 19
  • 28