2

I have a *.csv file with the following format:

ID Date (YYYY-MM-DD)

I need to calculate the age of each person, but I can´t find a way. I tried to read the column as date using

 parse_dates=['date']

but it didn´t work.

Then, I tried to add a column with the actual date and substracting both but I got a column type error and I wasn´t able con parse both to numeric. I tried pd.to_numeric(...,errors='coerce') but when I apply the dtype function it doesn´t return a numeric.

I´m frustrated as I´m starting with Pandas and it´s a very easy task to do in the softwares I´m used to but I can´t figure out how to do here. Any help would be really appreciated.

Luis
  • 209
  • 2
  • 6
  • 15

4 Answers4

0

Unless you post your csv or the code to create the dataframe, it would be difficult to answer. You may look at the link for a possible approach to your date-difference issue.

df
        A          B
one  2014-01-01  2014-02-28 
two  2014-02-03  2014-03-01

Assuming these were datetime columns (if they're not apply to_datetime) you can just subtract them:

df['A'] = pd.to_datetime(df['A'])
df['B'] = pd.to_datetime(df['B'])

In [11]: df.dtypes  # if already datetime64 you don't need to use to_datetime
Out[11]:
A    datetime64[ns]
B    datetime64[ns]
dtype: object

In [12]: df['A'] - df['B']
Out[12]:
one   -58 days
two   -26 days
dtype: timedelta64[ns]

In [13]: df['C'] = df['A'] - df['B']

In [14]: df
Out[14]:
             A          B        C
one     2014-01-01   2014-02-28 -58 days
two     2014-02-03   2014-03-01 -26 days
prabhakar
  • 472
  • 1
  • 4
  • 11
0

if you apply the option parse, you obtain a Timestamp() object. One posible option is transform your date to str and operate in this format, creating a new column in your pandas dataframe, for example:

>>> for i in df['you_datetime_column'].iteritems():
...:    date_ref = i[1]
...:    # your operation
...:    df['edad']=result
>>> date_ref
Timestamp('2017-01-09 11:42:05')  # date of the last row
>>> date_ref = stt(date_ref)
'2017-01-09 11:42:05'
>>> date_ref=str(date_ref.split([0]))
>>> date_ref
'2017-01-09'

Let's move on to calculate the age...

>>> from datetime import date
>>> def diferencia(date1,date2):
        d1 = date(int(date1[0:4]),int(date1[5:7]),int(date1[-2:]))
        d2 = date(int(date2[0:4]),int(date2[5:7]),int(date2[-2:]))
        dif = d2 - d1
        return str(round(dif.days/365.0,1))+' years'
>>> from datetime import datetime
>>> now=datetime.now().date()
>>> now
datetime.date(2018, 4, 9)
>>> now = str(now)
>>> now
'2018-04-09'
>>> diferencia(time_ref,now)
'1.2 years'
>>> diff = float(diferencia(k,l).split()[0])
>>> diff
1.2
>>> type(float)
float
Julio CamPlaz
  • 857
  • 8
  • 18
0

Here is a step-by-step example.

You haven't provided your logic. For us to help debug your problem, you should show us both your data and your code.

import pandas as pd
from io import StringIO

mystr = StringIO("""ID  Date
1 2000-02-03
2 1990-06-30
3 1995-05-12
4 1985-12-31
""")

# replace mystr with 'file.csv'
df = pd.read_csv(mystr, delim_whitespace=True, parse_dates=['Date'])

print(df.dtypes)

# ID               int64
# Date    datetime64[ns]
# dtype: object

df['Age'] = pd.to_datetime('now') - df['Date']

print(df)

#    ID       Date                 Age
# 0   1 2000-02-03  6640 days 09:32:54
# 1   2 1990-06-30 10145 days 09:32:54
# 2   3 1995-05-12  8368 days 09:32:54
# 3   4 1985-12-31 11787 days 09:32:54

df['Age'] = df['Age'] /  np.timedelta64(1, 'Y')

print(df)

#    ID       Date        Age
# 0   1 2000-02-03  18.180796
# 1   2 1990-06-30  27.777160
# 2   3 1995-05-12  22.911899
# 3   4 1985-12-31  32.272803
jpp
  • 159,742
  • 34
  • 281
  • 339
0

Since age could be number of days from a certain datetime, number of seconds from a certain datetime or number of years from a certain datetime and it is unclear what you mean by age, let's assume you want the number of days and without loss of generality, let's say your start date is the string '2010-3-13'. Here is how I would calculate it. The main idea is to convert the string '2010-3-13' to a datetime object so that I can subtract it from today's date.

from datetime import datetime

numDays = (datetime.now() - datetime.strptime('2010-3-11', '%Y-%m-%d')).days
# the date of this post is '2018-10-3'

If I want to print the number of days, I would do:

>>> numDays
[out]    3128 
Samuel Nde
  • 2,565
  • 2
  • 23
  • 23