32

I have a TimeDelta column with values that look like this:

2 days 21:54:00.000000000

I would like to have a float representing the number of days, let's say here 2+21/24 = 2.875, neglecting the minutes. Is there a simple way to do this ? I saw an answer suggesting

res['Ecart_lacher_collecte'].apply(lambda x: float(x.item().days+x.item().hours/24.))

But I get "AttributeError: 'str' object has no attribute 'item' "

Numpy version is '1.10.4' Pandas version is u'0.17.1'

The columns has originally been obtained with:

lac['DateHeureLacher'] = pd.to_datetime(lac['Date lacher']+' '+lac['Heure lacher'],format='%d/%m/%Y %H:%M:%S')
cap['DateCollecte'] = pd.to_datetime(cap['Date de collecte']+' '+cap['Heure de collecte'],format='%d/%m/%Y %H:%M:%S')

in a first script. Then in a second one:

res = pd.merge(lac, cap, how='inner', on=['Loc'])
res['DateHeureLacher']  = pd.to_datetime(res['DateHeureLacher'],format='%Y-%m-%d %H:%M:%S')
res['DateCollecte']  = pd.to_datetime(res['DateCollecte'],format='%Y-%m-%d %H:%M:%S')
res['Ecart_lacher_collecte'] = res['DateCollecte'] - res['DateHeureLacher']

Maybe saving it to csv change their types back to string? The transformation I'm trying to do is in a third script.

Sexe_x  PiegeLacher latL    longL   Loc Col_x   DateHeureLacher Nb envolees PiegeCapture    latC    longC   Col_y   Sexe_y  Effectif    DateCollecte    DatePose    Ecart_lacher_collecte   Dist_m
M   Q0-002  1629238 237877  H   Rouge   2011-02-04 17:15:00 928 Q0-002  1629238 237877  Rouge   M   1   2011-02-07 15:09:00 2011-02-07 12:14:00 2 days 21:54:00.000000000   0
M   Q0-002  1629238 237877  H   Rouge   2011-02-04 17:15:00 928 Q0-002  1629238 237877  Rouge   M   4   2011-02-07 12:14:00 2011-02-07 09:42:00 2 days 18:59:00.000000000   0
M   Q0-002  1629238 237877  H   Rouge   2011-02-04 17:15:00 928 Q0-003  1629244 237950  Rouge   M   1   2011-02-07 15:10:00 2011-02-07 12:16:00 2 days 21:55:00.000000000   75

res.info():

Sexe_x                   922 non-null object
PiegeLacher              922 non-null object
latL                     922 non-null int64
longL                    922 non-null int64
Loc                      922 non-null object
Col_x                    922 non-null object
DateHeureLacher          922 non-null object
Nb envolees              922 non-null int64
PiegeCapture             922 non-null object
latC                     922 non-null int64
longC                    922 non-null int64
Col_y                    922 non-null object
Sexe_y                   922 non-null object
Effectif                 922 non-null int64
DateCollecte             922 non-null object
DatePose                 922 non-null object
Ecart_lacher_collecte    922 non-null object
Dist_m                   922 non-null int64
jpp
  • 159,742
  • 34
  • 281
  • 339
alpagarou
  • 471
  • 2
  • 5
  • 11
  • I think you could use `float(res['columnName'].dt.days + res['columnName'].dt.hours / 24)` – Anton Protopopov Feb 19 '16 at 10:27
  • 2
    IIUC you can do `res['columnname'].dt.total_seconds()/ (24 * 60 * 60)` – EdChum Feb 19 '16 at 10:28
  • @AntonProtopopov I get "AttributeError: Can only use .dt accessor with datetimelike values" – alpagarou Feb 19 '16 at 10:33
  • @EdChum I get "AttributeError: 'Series' object has no attribute 'total_seconds' " – alpagarou Feb 19 '16 at 10:34
  • 1
    @alpagarou first you need to convert you columns to `datetime` object. You could do that with `pd.to_datetime` – Anton Protopopov Feb 19 '16 at 10:35
  • What's your version of pandas and can you post raw data, code and your numpy version and edit this into your question – EdChum Feb 19 '16 at 10:35
  • @AntonProtopopov pd.to_datetime raises "ValueError: Unknown string format" – alpagarou Feb 19 '16 at 10:41
  • @alpagarou post your date please. Without that it's hard to say what could cause the problem – Anton Protopopov Feb 19 '16 at 10:42
  • Are you saying your column is a string of timedeltas? – EdChum Feb 19 '16 at 10:44
  • post output from `res.info()`, also does `res['Ecart_lacher_collecte'].dt.total_seconds()/ (24 * 60 * 60)` work? – EdChum Feb 19 '16 at 10:51
  • @EdChum First I get "AttributeError: Can only use .dt accessor with datetimelike values" but when I try "res['Ecart_lacher_collecte'] = pd.to_datetime(res['Ecart_lacher_collecte'])" I get "ValueError: Unknown string format" – alpagarou Feb 19 '16 at 10:59
  • It looks you have invalid values in your columns, can you try `pd.to_datetime(res['DateHeureLacher'],format='%Y-%m-%d %H:%M:%S', errors='coerce')` also you need to post minimal data and code that we can run that reproduces your error as this should work, also when you read the csv back in and set `parse_dates=['DateHeureLacher', 'DateCollecte']` in `read_csv` does it fix the `dtypes`? – EdChum Feb 19 '16 at 11:01
  • parse_dates fixes the type of 'DateHeureLacher' and 'DateCollecte' but 'Ecart_lacher_collecte' which was timedelta64[ns] at the end of the precedent script, goes back to object when I read it from csv (I also tried to include it in the parse_dates but it doesn't work) I should probably use errors='coerce' to convert 'Ecart_lacher_collecte to datetime before using dt.days, but what format should I precise? – alpagarou Feb 19 '16 at 11:17
  • OK, I just tried this and found that you can't specify the dtype in `read_csv` to parse it back as a timedelta64, so you have a couple options, convert the timedelta64 to `int64` using `astype` and then when reading it back use a custom converter to convert it or after reading it back in you can do `astype(np.timedelta64)` and this will work – EdChum Feb 19 '16 at 11:37
  • I put my last two scripts in one and managed to it. However, I get a lot of SettingWithCopyWarning ? – alpagarou Feb 19 '16 at 11:42

3 Answers3

32

You can use pd.to_timedelta or np.timedelta64 to define a duration and divide by this:

# set up as per @EdChum
df['total_days_td'] = df['time_delta'] / pd.to_timedelta(1, unit='D')
df['total_days_td'] = df['time_delta'] / np.timedelta64(1, 'D')
jpp
  • 159,742
  • 34
  • 281
  • 339
10

You can use dt.total_seconds and divide this by the total number of seconds in a day, example:

In [25]:
df = pd.DataFrame({'dates':pd.date_range(dt.datetime(2016,1,1, 12,15,3), periods=10)})
df

Out[25]:
                dates
0 2016-01-01 12:15:03
1 2016-01-02 12:15:03
2 2016-01-03 12:15:03
3 2016-01-04 12:15:03
4 2016-01-05 12:15:03
5 2016-01-06 12:15:03
6 2016-01-07 12:15:03
7 2016-01-08 12:15:03
8 2016-01-09 12:15:03
9 2016-01-10 12:15:03

In [26]:
df['time_delta'] = df['dates'] - pd.datetime(2015,11,6,8,10)
df

Out[26]:
                dates       time_delta
0 2016-01-01 12:15:03 56 days 04:05:03
1 2016-01-02 12:15:03 57 days 04:05:03
2 2016-01-03 12:15:03 58 days 04:05:03
3 2016-01-04 12:15:03 59 days 04:05:03
4 2016-01-05 12:15:03 60 days 04:05:03
5 2016-01-06 12:15:03 61 days 04:05:03
6 2016-01-07 12:15:03 62 days 04:05:03
7 2016-01-08 12:15:03 63 days 04:05:03
8 2016-01-09 12:15:03 64 days 04:05:03
9 2016-01-10 12:15:03 65 days 04:05:03

In [27]:
df['total_days_td'] = df['time_delta'].dt.total_seconds() / (24 * 60 * 60)
df

Out[27]:
                dates       time_delta  total_days_td
0 2016-01-01 12:15:03 56 days 04:05:03      56.170174
1 2016-01-02 12:15:03 57 days 04:05:03      57.170174
2 2016-01-03 12:15:03 58 days 04:05:03      58.170174
3 2016-01-04 12:15:03 59 days 04:05:03      59.170174
4 2016-01-05 12:15:03 60 days 04:05:03      60.170174
5 2016-01-06 12:15:03 61 days 04:05:03      61.170174
6 2016-01-07 12:15:03 62 days 04:05:03      62.170174
7 2016-01-08 12:15:03 63 days 04:05:03      63.170174
8 2016-01-09 12:15:03 64 days 04:05:03      64.170174
9 2016-01-10 12:15:03 65 days 04:05:03      65.170174
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 3
    yes it works but I'm shocked they don't have a dt.total_days() or dt.total_hours available... – data-monkey Jan 06 '17 at 15:33
  • 1
    @LedgerYu because it's not a standard method for timedelta, additionally it's trivial to calculate this yourself but besides if we followed your logic we'd then have methods for total_days, total_hours, total_quarters, total_years, total_milliseconds etc. – EdChum Jan 06 '17 at 15:35
  • You can just do `dt.days`, not as a function. Works for me! – Guillochon Jul 05 '18 at 20:59
  • 1
    @Guillochon `dt.days` will only give you the number of days and will throw out all the time information – Taylor Sep 06 '18 at 18:24
0

Have you tried using this instead?

res['Ecart_lacher_collecte'].apply(lambda x: (x.total_seconds()//(3600*24)) + (x.total_seconds()%(3600*24)//3600)/24))

The first term is the Day ( 2 in your case ) The second term is the hour ratio neglecting the minutes ( 21/24 in your case)

If you don't want the minutes and seconds data to be neglected, and rather need a ratio which considers all the seconds in the day, the code is as mentioned below:

res['Ecart_lacher_collecte'].apply(lambda x: (x.total_seconds()/(3600*24))
  • Why separate the division and then join it like that? The only difference from x.total_seconds()/(3600*24) is that incomplete hours don't count at all, which I doubt is the desired behavior. – Elias Hasle Oct 18 '18 at 09:14
  • Yes, that's right. the incomplete hours aren't counted. That's because the original poster wanted it that way: 'I would like to have a float representing the number of days, let's say here 2+21/24 = 2.875, neglecting the minutes.' – sharinganSawant Oct 19 '18 at 12:22
  • 1
    @EliasHasle I have edited the answer where I have mentioned why the incomplete hours weren't accounted for. – sharinganSawant Oct 19 '18 at 12:31