6

I would like to create a new column in my dataset, which is a difference in years between today and a another column already in the dataset, filled up with dates.

the code above:

df['diff_years'] = datetime.today() - df['some_date']
df['diff_years']

give me the following output (exemple):

1754 days 11:44:28.971615

and i have to get something like (meaning the output above in years):

4,8 
(or 5)

I appreciate any help!

PS.: i would like to avoid looping the series, path i believe would give me a desired solution, but due having a big series i would like to avoid this way.

jpp
  • 159,742
  • 34
  • 281
  • 339
Wagner R.
  • 63
  • 1
  • 4
  • Well, a year is not exactly defined. It can have between 365 and 366 days. Or be even more complicated if you go further back in history. – Graipher Mar 02 '18 at 14:54
  • @Wagner I think pandas date difference is not useful in this case you should convert your pandas date into datetime and should use relativedelta because it will give you difference in year,remaining months and days . Hope it helps – Shubham Sharma Mar 02 '18 at 15:04

2 Answers2

4

Here is one way:

import pandas as pd, numpy as np

df = pd.DataFrame({'date': ['2009-06-15 00:00:00']})

df['years'] = (pd.to_datetime('now') - pd.to_datetime(df['date'])) / np.timedelta64(1, 'Y')

#                   date     years
# 0  2009-06-15 00:00:00  8.713745
jpp
  • 159,742
  • 34
  • 281
  • 339
  • I got the output: OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 60824-01-01 00:00:00. Btw, my date series is in the 2009-06-15 00:00:00 format – Wagner R. Mar 02 '18 at 15:01
  • @WagnerR. see my update - it still works. You will need to provide some reproducible code to help debug. – jpp Mar 02 '18 at 15:08
1

Before some days i was facing same issue in my project now i had tried with these ,

from dateutil.relativedelta import relativedelta
from datetime import date
now = date.today()
some_date = date(df['some_date'])

rdelta = relativedelta(now, some_date)
print('diff in years - ', rdelta.years)
print('remaining months - ', rdelta.months)
print('remaining days - ', rdelta.days)

It should print difference in years

Shubham Sharma
  • 2,763
  • 5
  • 31
  • 46