0

I am using the resample funtion to go from minute data to hour data. The issue is my original DF has only from 10:30 to 15:59 data and the new resampled data is creating new hour data. How to I get rid of this data or have it resample only the time stamps on the index and not create new ones.

This is how the original DF looked: original  DF

This is how the resampled DF looks: enter image description here

My question is: how to I get ride of the other hour data from the resample and just include the previous hour data from the original DF?

Below is the code that I used to created the resampledDF

ROD['time'] = pd.to_datetime(ROD['timestamp'])
ROD.set_index('time', inplace = True, drop = True)
resampled = ROD.resample('60Min',origin='start').agg({'open':'first',
                                                      'high':'max',
                                                      'low': 'min',
                                                      'close': 'last', 'volume':'sum'})

Below is the output from printing resampled:

 open        high         low       close    volume
time                                                                         
2020-09-28 10:31:00  333.984985  334.470001  332.779999  333.750000   6482408
2020-09-28 11:31:00  333.760010  334.260010  333.109985  333.469910   4456465
2020-09-28 12:31:00  333.475006  334.500000  333.190002  334.239990   3711147
2020-09-28 13:31:00  334.239990  334.820007  334.174988  334.501099   4181924
2020-09-28 14:31:00  334.500000  334.959992  334.079987  334.600006   5698198
                         ...         ...         ...         ...       ...
2022-09-23 11:31:00  367.779999  368.170013  365.070007  365.119995   9603892
2022-09-23 12:31:00  365.109985  367.190002  364.825012  365.778412   9306106
2022-09-23 13:31:00  365.769989  366.649994  364.089996  364.829895   9172447
2022-09-23 14:31:00  364.820007  366.480011  363.290008  366.221405  14831712
2022-09-23 15:31:00  366.220001  368.040008  366.000000  367.440002  14253081
svlad
  • 73
  • 8
  • no images, post the data as a code (preferably) or text. second, share your attempt that resulted in the result you have shared here. here is stackoverflow guidelines in posting questions https://stackoverflow.com/help/how-to-ask – Naveed Oct 21 '22 at 22:41
  • Let me know if the new version is better – svlad Oct 21 '22 at 22:48
  • its better. I posted an answer, hope it helps – Naveed Oct 21 '22 at 23:45

2 Answers2

0

Filter by time column is what you need? Try this;

begin_ts = '9/28/2020: 10:30' 
end_ts = '9/28/2020: 15:30'

df.set_index('time',inplace=True)
new_df = df.iloc[df.index.get_loc(begin_ts):df.index.get_loc(begin_ts)].copy()

Having said that, please consider indexing with a better index like timestamp.

spramuditha
  • 357
  • 2
  • 9
  • After I post the answer I see you have added some code and data indicating you are in fact using Timestamp. Still the process should work with ```begin_ts``` and ```end_ts``` modified with the correct format. Sorry from the initial post I though you are using original strings from some dataset. – spramuditha Oct 21 '22 at 22:50
0

since the time period spans multiple days, it results in all hours of the day (24 for each day).

One way about it is to filter the result set for the time range using between_time

resampled.between_time('10:30:00', '13:59:00')
Naveed
  • 11,495
  • 2
  • 14
  • 21