I have a dataframe which has a column named 'fecha_dato'. It stores the date like '2016-05-28'. I want to extract the 2016, 05 and 28 as int from fecha_dato as new columns named year, month and day. I use the iterator way but it is too slow. Is there any efficient way to do this ?
Asked
Active
Viewed 1.2k times
2 Answers
16
You need dt.year
, dt.month
and dt.day
:
df['year'] = df.fecha_dato.dt.year
df['month'] = df.fecha_dato.dt.month
df['day'] = df.fecha_dato.dt.day
Sample:
df = pd.DataFrame({'fecha_dato':['2016-05-28','2016-06-28','2016-07-28']})
#if dtype is not datetime, cast it
df.fecha_dato = pd.to_datetime(df.fecha_dato)
df['year'] = df.fecha_dato.dt.year
df['month'] = df.fecha_dato.dt.month
df['day'] = df.fecha_dato.dt.day
print (df)
fecha_dato year month day
0 2016-05-28 2016 5 28
1 2016-06-28 2016 6 28
2 2016-07-28 2016 7 28

jezrael
- 822,522
- 95
- 1,334
- 1,252
4
Elegant way to parse string into all three columns with one step
df = pd.DataFrame({'fecha_dato':['2016-05-28','2016-06-28','2016-07-28']})
regex = '(?P<Year>[^-]+)-(?P<Month>[^-]+)-(?P<Day>[^-]+)'
pd.concat([df, df.fecha_dato.str.extract(regex).astype(int)], axis=1)

piRSquared
- 285,575
- 57
- 475
- 624
-
1but then cast to int – jezrael Dec 14 '16 at 09:21
-
@jezrael yes, thank you – piRSquared Dec 14 '16 at 09:23