0

My dataframe has a column which measures time difference in the format HH:MM:SS.000

The pandas is formed from an excel file, the column which stores time difference is an Object. However some entries have negative time difference, the negative sign doesn't matter to me and needs to be removed from the time as it's not filtering a condition I have:

enter image description here

Note: I only have the negative time difference there because of the issue I'm currently having.

I've tried the following functions but I get errors as some of the time difference data is just 00:00:00 and some is 00:00:02.65 and some are 00:00:02.111

enter image description here

firstly how would I ensure that all data in this column is to 00:00:00.000. And then how would I remove the '-' from some the data.

enter image description here

Here's a sample of the time diff column, I cant transform this column into datetime as some of the entries dont have 3 digits after the decimal. Is there a way to iterate through the column and add a 0 if the length of the value isn't equal to 12 digits.

00:00:02.97

00:00:03:145

00:00:00

00:00:12:56

28 days 03:05:23.439
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
scrubcoder
  • 63
  • 1
  • 5
  • first of all, please post text/code, not images. then, why would you remove the '-'? Doesn't that mean that the timedelta is negative (which should be taken into account)? – FObersteiner Jul 01 '20 at 14:48
  • The time difference has already been calculated, the excel file which the pandas is created from just takes the values. Sorry new to python, think my main issue is that some of the time values are incorrect in length for nanoseconds i.e: some are 00:00:02:12 and not 00:00:02:120 – scrubcoder Jul 01 '20 at 14:54
  • it also seems you try to convert to datetime; I'd suggest to use `pd.to_timedelta()` instead. it would be great if you could provide a [mre]. – FObersteiner Jul 01 '20 at 15:00
  • struggling to copy code across, I entered some examples from the time difference column – scrubcoder Jul 01 '20 at 15:26
  • 1
    too long for a comment so I added an answer. this might only be one step in getting your whole script going, hope it helps. – FObersteiner Jul 01 '20 at 15:40

1 Answers1

0

It looks like you need to clean your input before you can parse to timedelta, e.g. with the following function:

import pandas as pd

def clean_td_string(s):
    if s.count(':') > 2:
        return '.'.join(s.rsplit(':', 1))
    return s

Applied to a df's column, this looks like

df = pd.DataFrame({'Time Diff': ['00:00:02.97', '00:00:03:145', '00:00:00', '00:00:12:56', '28 days 03:05:23.439']})
df['Time Diff'] = pd.to_timedelta(df['Time Diff'].apply(clean_td_string))

# df['Time Diff']
# 0    0 days 00:00:02.970000
# 1    0 days 00:00:03.145000
# 2           0 days 00:00:00
# 3    0 days 00:00:12.560000
# 4   28 days 03:05:23.439000
# Name: Time Diff, dtype: timedelta64[ns]
FObersteiner
  • 22,500
  • 8
  • 42
  • 72