0

1) I have the following 1-minute-frequency-data in a pandas DataFrame:

0 Open High Low Close Volume
2010-10-19 06:31:00 58.75 58.81 58.58 58.59 228125
2010-10-19 06:32:00 58.59 58.68 58.55 58.57 153303
2010-10-19 06:33:00 58.57 58.6 58.5 58.52 115647
2010-10-19 06:34:00 58.52 58.58 58.48 58.58 63577
2010-10-19 06:35:00 58.57 58.59 58.51 58.53 111770

2) I also have the following index array:

[2010-10-19 06:32:00, 2010-10-19 06:35:00]

3) I want to reindex the DataFrame according to the index array such that the new DataFrame will only have the 2 rows of the index array, while managing to resample it so that the high of the first row of the new dataframe is the higher of the highs from the first 2 rows of the original dataframe, the low of the second row of the new dataframe is the lower of the 3 lows in the original dataframe, etc.

Normally, one would aggregate one's data via .resample() and .agg(), but that's once you already have the dataframe in the state that you want. I can't use reindex() in such a way that I could follow it up with .resample() and accomplish this.

I suppose I'm looking for a way to reindex and resample in one move. How do I best do this?

pmse234
  • 15
  • 4

1 Answers1

0

Adapting the answer from pandas Dataframe resampling with specific dates

from datetime import datetime

import numpy as np
import pandas as pd

df = pd.DataFrame(
    data={c: np.random.rand(5) for c in ['o', 'h', 'l', 'c', 'v']},
    index=pd.date_range(datetime(2020, 10, 19, 6, 31), datetime(2020, 10, 19, 6, 35), freq='T')
)
print(df)
                            o         h         l         c         v
2020-10-19 06:31:00  0.868832  0.011599  0.614113  0.920998  0.237791
2020-10-19 06:32:00  0.909751  0.277570  0.820222  0.493289  0.941469
2020-10-19 06:33:00  0.998590  0.667477  0.108915  0.551331  0.081069
2020-10-19 06:34:00  0.160800  0.179726  0.987618  0.351980  0.253893
2020-10-19 06:35:00  0.553217  0.873212  0.291289  0.235526  0.525988
sample_index = pd.DatetimeIndex([datetime(2020, 10, 19, 6, 32), datetime(2020, 10, 19, 6, 35)])
agg = {'o': 'first', 'h': 'max', 'l': 'min', 'c': 'last', 'v': 'sum'}
ohlcv = df.groupby(sample_index[sample_index.searchsorted(df.index)]).agg(agg)
print(ohlcv)
                            o         h         l         c         v
2020-10-19 06:32:00  0.868832  0.277570  0.614113  0.493289  1.179259
2020-10-19 06:35:00  0.998590  0.873212  0.108915  0.235526  0.860951
konichuvak
  • 21
  • 1
  • 3