3

I have a dataset of measured values and their corresponding timestamps in the format hh:mm:ss, where hh can be > 24 h.

For machine learning tasks, the data need to be interpolated since there are multiple measured values with different timestamps, respectively. For resampling and interpolation, I figuered out that the dtype of the index should be in the datetime-format. For further data-processing and machine learning tasks, I would need the timedelta format again.

Here is some code:

Res_cont = Res_cont.set_index('t_a') #t_a is the column of the timestamps for the measured variable a from a dataframe

#Then, I need to change datetime-format for resampling and interpolation, otherwise timedate are not like 00:15:00, but like 00:15:16 for example
Res_cont.index = pd.to_datetime(Res_cont.index) 

#first, upsample to seconds, then interpolate linearly and downsample to 15min steps, lastly  
Res_cont = Res_cont.resample('s').interpolate(method='linear').resample('15T').asfreq().dropna()

Res_cont.index = pd.to_timedelta(Res_cont.index) #Here is, where the error ocurred

Unfortunatly, I get the following Error message:

FutureWarning: Passing datetime64-dtype data to TimedeltaIndex is deprecated, will raise a TypeError in a future version Res_cont = pd.to_timedelta(Res_cont.index)

So obviously, there is a problem with the last row of my provided code. I would like to know, how to change this code to prevent a Type Error in a future version. Unfortunatly, I don't have any idea how to fix it.

Maybe you can help?

EDIT: Here some arbitrary sample data:

t_a = ['00:00:26', '00:16:16', '00:25:31', '00:36:14', '25:45:44']
a = [0, 1.3, 2.4, 3.8, 4.9]
Res_cont = pd.Series(data = a, index = t_a)
Max K.
  • 51
  • 1
  • 8

1 Answers1

0

You can use DatetimeIndex.strftime for convert output datetimes to HH:MM:SS format:

t_a = ['00:00:26', '00:16:16', '00:25:31', '00:36:14', '00:45:44'] 
a = [0, 1, 2, 3, 4] 

Res_cont = pd.DataFrame({'t_a':t_a,'a':a})
print (Res_cont)
        t_a  a
0  00:00:26  0
1  00:16:16  1
2  00:25:31  2
3  00:36:14  3
4  00:45:44  4

Res_cont = Res_cont.set_index('t_a') 
Res_cont.index = pd.to_datetime(Res_cont.index) 
Res_cont=Res_cont.resample('s').interpolate(method='linear').resample('15T').asfreq().dropna()
Res_cont.index = pd.to_timedelta(Res_cont.index.strftime('%H:%M:%S'))
print (Res_cont)
                 a
00:15:00  0.920000
00:30:00  2.418351
00:45:00  3.922807
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Unfortunatly, that's not possible. As I wrote in my request, datetime-index is mandatory for resampling and interpolation in my case. I tried to specify this in the comments for my code: > Then, I need to change datetime-format for resampling and interpolation, > otherwise timedate are not like 00:15:00, but like 00:15:16 for example – Max K. Aug 08 '19 at 12:49
  • @MaxK. - Can you create some sample data? – jezrael Aug 08 '19 at 12:50
  • I added some sample data under "EDIT" in my original post – Max K. Aug 08 '19 at 12:58
  • @MaxK. - Thank you, changed solution. – jezrael Aug 08 '19 at 13:02
  • @MaxK. - What is your pandas version? – jezrael Aug 08 '19 at 13:06
  • Sorry, I deleted my previous comment, because the error no longer occurs. BUT, there's a different problem now. The timestamps are in the format hh:mm:ss, where hh > 24 h is possible. Timedelta added days, so that 25:00:00 could be descriped as 1 day, 1 h, ... Now with your solution, the problem is, that days are not counted anymore (25h would be day 0, 1h) – Max K. Aug 08 '19 at 13:12
  • Pandas version is 0.24.2 – Max K. Aug 08 '19 at 13:13
  • @MaxK. - If check last edited solution, still not working? – jezrael Aug 12 '19 at 11:55
  • Yes, the issue from my comment from Aug 8 at 13:12 is still valid: there is no more error, but if hh>24, the days won't be counted with that solution – Max K. Aug 12 '19 at 14:47
  • Any more suggestions? – Max K. Aug 19 '19 at 15:05