5

My Primary Program Collect Tick Data from Server and store these data in text file. Sample data in dataframe looks like below:

SYMBOL_N    PRICE   DATE        TIME        VOLUME
35324399    92.31   02/11/18    12:45:26    108856
35324399    92.32   02/11/18    12:45:26    108865
35324399    92.32   02/11/18    12:46:27    108896
35324399    92.38   02/11/18    12:46:28    108932
35324399    92.45   02/11/18    12:47:28    108988
35324399    92.48   02/11/18    12:47:30    109132
35324399    92.52   02/11/18    12:47:52    109256
35324399    92.57   02/11/18    12:48:31    109288
...
...
35324400    76.62   02/11/18    12:45:22    104569
35324400    76.66   02/11/18    12:46:33    104582
35324400    76.68   02/11/18    12:47:06    104602
35324400    76.68   02/11/18    12:47:12    104645
35324400    76.71   02/11/18    12:47:28    104724
35324400    76.74   02/11/18    12:48:29    104944
35324400    76.77   02/11/18    12:48:36    105074
35324400    76.79   02/11/18    12:48:42    106988

There are multiple tokens in the dataframe. I want to convert these data to OHLCV Candlestick for specified time frame like (1 Min, 3 Min, 5 Min). Again the Volume in OHLCV Candlestick should be the difference of Max Volume (Previous Candle - Current Candle) for the said time frame.

Please help.

Pravat
  • 329
  • 2
  • 17

1 Answers1

8

This can be done with resample.

I first calculated the volume the way you asked, but I think you actually need the difference between the max of the current candle and the max of the previous candle. This is the code:

timeframe = '1min'

tick_data['DATETIME'] = pd.to_datetime(tick_data['DATE'] + ' ' + tick_data['TIME'])
tick_data.set_index('DATETIME', inplace=True)

ohlcv_data = pd.DataFrame(columns=[
    'SYMBOL_N',
    'open',
    'high',
    'low',
    'close',
    'volume'])

for symbol in tick_data['SYMBOL_N'].unique():
    ohlcv_symbol =  tick_data.loc[tick_data['SYMBOL_N'] == symbol, 'PRICE'].resample(timeframe).ohlc()
    ohlcv_symbol['SYMBOL_N'] = symbol
    ohlcv_symbol['volume'] = (tick_data.loc[tick_data['SYMBOL_N'] == symbol, 'VOLUME'].resample(timeframe).max() - tick_data.loc[tick_data['SYMBOL_N'] == symbol, 'VOLUME'].resample(timeframe).max().shift(1))
    ohlcv_data = ohlcv_data.append(ohlcv_symbol, sort=False)

print(ohlcv_data)

And this is the result:

                     SYMBOL_N   open   high    low  close  volume
2018-02-11 12:45:00  35324399  92.31  92.32  92.31  92.32     NaN
2018-02-11 12:46:00  35324399  92.32  92.38  92.32  92.38    67.0
2018-02-11 12:47:00  35324399  92.45  92.52  92.45  92.52   324.0
2018-02-11 12:48:00  35324399  92.57  92.57  92.57  92.57    32.0
2018-02-11 12:45:00  35324400  76.62  76.62  76.62  76.62     NaN
2018-02-11 12:46:00  35324400  76.66  76.66  76.66  76.66    13.0
2018-02-11 12:47:00  35324400  76.68  76.71  76.68  76.71   142.0
2018-02-11 12:48:00  35324400  76.74  76.79  76.74  76.79  2264.0
Harm
  • 590
  • 3
  • 21