1

I have some data imported from an XLS file that has 2 columns containing time information in the format of strings. Example: 04:15:45 (which means 4 hours, 15 minutes, and 45 seconds).

Next I convert it to datatime

df['column'] = pd.to_datetime(df['column'], errors='coerce')

I get something like this 2019-01-09 04:15:45 which is not exactly what I want, but that's okay ( I rather have it in the format %H%M%S). I do the same thing with another column with the same characteristics. Next I create a new column with the difference between them:

df['new column'] =  df['column1'] - df['column2']

However when I try to subtract both I get results like this: -1 days +23:00:00 when all I really want is something like 00:16:12, containing just the difference in %H%M%S.

The desired result, algorithmically speaking, would be something like:

(time planned) - (time it actually took) = (difference format 00:18:12)
  • Possible duplicate of [pandas: extract date and time from timestamp](https://stackoverflow.com/questions/39662149/pandas-extract-date-and-time-from-timestamp) – TheNavigat Jan 09 '19 at 20:53
  • `df['new column'].compontents` may help you – cors Jan 09 '19 at 21:00
  • 1
    I think you want `pd.to_timedelta` – ALollz Jan 09 '19 at 21:23
  • Actually, the above wont really matter, but still should be used as you don't have a date part. The display you are seeing is consistent with how `datetimes` display negative values, so you just aren't going to get the format you want. I mean you *can* if you convert it to a string column, but you lose all `datetime` functionality which seems stupid. Perhaps consider converting it to total seconds, which is a signed integer. – ALollz Jan 09 '19 at 21:36

1 Answers1

2

You could convert the output to a difference in seconds.

df = pd.DataFrame({'t1': ['1:13:14', '2:34:56', '2:44:32'], 't2': ['1:15:00', '2:37:00', '3:00:00']})
df.t1 = pd.to_timedelta(df.t1)
df.t2 = pd.to_timedelta(df.t2)
df
        t1       t2
0  1:13:14  1:15:00
1  2:34:56  2:37:00
2  2:44:32  3:00:00

df['t_diff'] = (df.t1 - df.t2).dt.total_seconds()

        t1       t2  t_diff
0 01:13:14 01:15:00  -106.0
1 02:34:56 02:37:00  -124.0
2 02:44:32 03:00:00  -928.0
Grr
  • 15,553
  • 7
  • 65
  • 85