85

I import a dataframe via read_csv, but for some reason can't extract the year or month from the series df['date'], trying that gives AttributeError: 'Series' object has no attribute 'year':

date    Count
6/30/2010   525
7/30/2010   136
8/31/2010   125
9/30/2010   84
10/29/2010  4469

df = pd.read_csv('sample_data.csv', parse_dates=True)

df['date'] = pd.to_datetime(df['date'])

df['year'] = df['date'].year
df['month'] = df['date'].month

UPDATE: and when I try solutions with df['date'].dt on my pandas version 0.14.1, I get "AttributeError: 'Series' object has no attribute 'dt' ":

df = pd.read_csv('sample_data.csv',parse_dates=True)

df['date'] = pd.to_datetime(df['date'])

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO.


FOLLOWUP: I can't seem to update my pandas 0.14.1 to a newer release in my Anaconda environment, each of the attempts below generates an invalid syntax error. I'm using Python 3.4.1 64bit.

conda update pandas

conda install pandas==0.15.2

conda install -f pandas

Any ideas?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
MJS
  • 1,573
  • 3
  • 17
  • 26
  • where is year coming from? Are you trying to access a row `(df["date"][0].year`? – Padraic Cunningham May 22 '15 at 20:38
  • I have a csv file with dates and other columns which all came out of SQL as string data. Trying to create new year and month columns to use for grouping. the csv has ~5000 rows. – MJS May 22 '15 at 20:41
  • Yes but the `df["date"]` is a `'pandas.core.series.Series'` object. what should `df['date'].year` be? – Padraic Cunningham May 22 '15 at 20:42
  • I just want to create 2 more columns... one for year and one for month as integers. – MJS May 22 '15 at 20:45
  • tried df['date'].dt.year as well, EdChum. – MJS May 22 '15 at 20:48
  • see my update, it should work in your version of pandas, can you post which version you're using – EdChum May 22 '15 at 20:58
  • The [**`.dt` accessor for datetime functionality was new in pandas 0.15.0** (Oct 2014)](https://pandas.pydata.org/pandas-docs/version/0.15.0/whatsnew.html#whatsnew-0150-dt). So it ain't gonna work in 0.14.1, you must update. The footnote about Anaconda update is a separate question, recommend deleting it, conda is a different beast and well covered by other Q&A. – smci Jan 19 '21 at 20:18

5 Answers5

128

If you're running a recent-ish version of pandas then you can use the datetime accessor dt to access the datetime components:

In [6]:

df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month
df
Out[6]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

EDIT

It looks like you're running an older version of pandas in which case the following would work:

In [18]:

df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month)
df
Out[18]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

Regarding why it didn't parse this into a datetime in read_csv you need to pass the ordinal position of your column ([0]) because when True it tries to parse columns [1,2,3] see the docs

In [20]:

t="""date   Count
6/30/2010   525
7/30/2010   136
8/31/2010   125
9/30/2010   84
10/29/2010  4469"""
df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 2 columns):
date     5 non-null datetime64[ns]
Count    5 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 120.0 bytes

So if you pass param parse_dates=[0] to read_csv there shouldn't be any need to call to_datetime on the 'date' column after loading.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
EdChum
  • 376,765
  • 198
  • 813
  • 562
16

This works:

df['date'].dt.year

Now:

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

gives this data frame:

        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
7

When to use dt accessor

A common source of confusion revolves around when to use .year and when to use .dt.year.

The former is an attribute for pd.DatetimeIndex objects; the latter for pd.Series objects. Consider this dataframe:

df = pd.DataFrame({'Dates': pd.to_datetime(['2018-01-01', '2018-10-20', '2018-12-25'])},
                  index=pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03']))

The definition of the series and index look similar, but the pd.DataFrame constructor converts them to different types:

type(df.index)     # pandas.tseries.index.DatetimeIndex
type(df['Dates'])  # pandas.core.series.Series

The DatetimeIndex object has a direct year attribute, while the Series object must use the dt accessor. Similarly for month:

df.index.month               # array([1, 1, 1])
df['Dates'].dt.month.values  # array([ 1, 10, 12], dtype=int64)

A subtle but important difference worth noting is that df.index.month gives a NumPy array, while df['Dates'].dt.month gives a Pandas series. Above, we use pd.Series.values to extract the NumPy array representation.

jpp
  • 159,742
  • 34
  • 281
  • 339
5

Probably already too late to answer but since you have already parse the dates while loading the data, you can just do this to get the day

df['date'] = pd.DatetimeIndex(df['date']).year
Amit Gupta
  • 2,698
  • 4
  • 24
  • 37
1

What worked for me was upgrading pandas to latest version:

From Command Line do:

conda update pandas
Kjartan
  • 18,591
  • 15
  • 71
  • 96
Jimmy
  • 2,165
  • 1
  • 17
  • 13