1

I have year data in a pandas dataframe as following:

0     06/09/1937 
1     22/11/1972

and i would like to extract only the year data:

0     1937 
1     1972

My code:

features["year"] = df["birth_date"].str.split('/',2)
features["year"] = features["year"][:2]

I got an error as:

ValueError: Can only tuple-index with a MultiIndex

Then I tried

features["year"] = [x[2] for x in features["year"]]

TypeError: 'float' object is not subscriptable

I use Python 3. Could you tell me the reasons for these two errors and how to correct them? Thanks in advance.

Cheng
  • 59
  • 1
  • 1
  • 6
  • Convert the dtype to `datetime` using `pd.to_datetime` then you can just use the linked answer, as simple as that – EdChum Apr 07 '17 at 08:54
  • Hi, The linked answer works fine for me. By the way, can you explan me in some words how these two errors in my question produced?@EdChum – Cheng Apr 07 '17 at 09:28
  • this `features["year"][:2]` is invalid syntax, it thinks you're trying to index a multi-index which is why it raised the error, the other issue is probably because you have missing value `NaN` so you can't subscript this – EdChum Apr 07 '17 at 09:36
  • Thank you EdChum, can you give me an example of tuple index and multi index please @EdChum? – Cheng Apr 07 '17 at 09:53
  • Please read the [docs](http://pandas.pydata.org/pandas-docs/stable/advanced.html) SO is not a user forum – EdChum Apr 07 '17 at 09:54

1 Answers1

0

You need:

features["year"] = df["birth_date"].str.split('/',2)
features["year"] = features["year"].str[:2]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252