5

I have a dataframe which has a column named 'fecha_dato'. It stores the date like '2016-05-28'. I want to extract the 2016, 05 and 28 as int from fecha_dato as new columns named year, month and day. I use the iterator way but it is too slow. Is there any efficient way to do this ?

yanachen
  • 3,401
  • 8
  • 32
  • 64

2 Answers2

16

You need dt.year, dt.month and dt.day:

df['year'] = df.fecha_dato.dt.year
df['month'] = df.fecha_dato.dt.month
df['day'] = df.fecha_dato.dt.day

Sample:

df = pd.DataFrame({'fecha_dato':['2016-05-28','2016-06-28','2016-07-28']})

#if dtype is not datetime, cast it
df.fecha_dato = pd.to_datetime(df.fecha_dato)

df['year'] = df.fecha_dato.dt.year
df['month'] = df.fecha_dato.dt.month
df['day'] = df.fecha_dato.dt.day
print (df)
  fecha_dato  year  month  day
0 2016-05-28  2016      5   28
1 2016-06-28  2016      6   28
2 2016-07-28  2016      7   28
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
4

Elegant way to parse string into all three columns with one step

df = pd.DataFrame({'fecha_dato':['2016-05-28','2016-06-28','2016-07-28']})

regex = '(?P<Year>[^-]+)-(?P<Month>[^-]+)-(?P<Day>[^-]+)'
pd.concat([df, df.fecha_dato.str.extract(regex).astype(int)], axis=1)

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624