This might seem a broad question but it is precise that I want to achieve so please read it.
I am building an application that connects to market broadcast and subscribes nearly 200 scrips for broadcast. Broadcast is received for every script every second.
Now in the application I want to achieve two things:
- Collect the broadcast and make 1min, 5min, 15min candle and so on.
- Compare the live broadcast value for certain scrips with predefined values.
I had tried different approach to the problem but only one approach showed best results. Even though it is not the desired result.
My current approach:
main.py----->Process----
|
|
------------
| |
broadcast candle
In broadcast
I get the broadcast and make 1 min candle and forward it through a Pipe()
connection to the candle. In candle
I try to make candles at required intervals.
Problem:
- It takes nearly one second for each symbol/each candle to form. I use
pandas
dataframe for storing data and thengroupby()
operation to form a candle. - For 1min candle takes nearly 6-10 secs for candle formation. It is desired to be nearly 2 secs or less.
- There is no room for broadcast comparison.
What I want to know is that what should be my approach in terms of data structures and data processing so that there is enough room for all the processes to take place and as fast as possible ?
I was thinking of HDF5 for data storage and dask for processing but is that worth ? (I have never worked on HDF5/dask).
Any suggestion is welcome. Thanks !
EDIT
This is code where I make candle according to the required frequency. Note that it is called for every scrip after I slice scrip data from a master dataframe.
def __create_minutes_candles(self, data=None, freq=None, latest=True):
start_time = time()
data = data if isinstance(data, pd.DataFrame) else self.__data
if freq and not data.empty:
for _symbol in self.scrips:
symbol_df = data[data['symbol'] == _symbol]
symbol_candle = self.__create_candle(_symbol, symbol_df, freq)
# Following code assembles all candles for all symbols and then append it to file
def __create_candle(self, symbol, symbol_df, freq):
if not symbol_df.empty:
candle = symbol_df.groupby(pd.Grouper(freq=freq, base=self.base)).apply(self.__candle_calc)
candle.dropna(how='all', inplace=True)
volume = symbol_df['volume'].resample(rule=freq, base=self.base).sum()
volume = volume[volume != 0]
symbol_candle = pd.concat([candle, volume], axis=1)
symbol_candle['symbol'] = [symbol] * len(symbol_candle)
symbol_candle = symbol_candle[self.file_columns]
return symbol_candle
This is view of dataframe from where I generate candles:
timestamp symbol ltp volume
0 2019-02-27 12:34:38 NIFTY_50 10778.70 NaN
1 2019-02-27 12:34:36 ACC 1430.00 1154730.0
2 2019-02-27 12:34:37 ADANIENT 131.30 3687673.0
3 2019-02-27 12:34:37 ADANIPORTS 325.60 4091014.0
4 2019-02-27 12:34:37 ADANIPOWER 47.85 28333489.0
5 2019-02-27 12:34:35 AJANTPHARM 997.00 397915.0
6 2019-02-27 12:34:37 ALBK 46.30 17753857.0
7 2019-02-27 12:34:33 AMARAJABAT 720.90 317973.0
8 2019-02-27 12:34:37 AMBUJACEM 214.95 1648760.0
9 2019-02-27 12:34:35 APOLLOHOSP 1152.15 738995.0
10 2019-02-27 12:34:37 APOLLOTYRE 212.65 1722708.0
11 2019-02-27 12:34:37 ARVIND 75.45 2225521.0
12 2019-02-27 12:34:37 ASHOKLEY 82.95 21954785.0
13 2019-02-27 12:34:37 ASIANPAINT 1405.35 755940.0
14 2019-02-27 12:34:37 AUROPHARMA 717.70 1248541.0
15 2019-02-27 12:34:37 AXISBANK 715.25 8247772.0
16 2019-02-27 12:34:37 BAJAJ-AUTO 2900.95 541710.0
17 2019-02-27 12:34:33 BAJAJFINSV 6398.05 182373.0
18 2019-02-27 12:34:37 BAJFINANCE 2637.80 759410.0
19 2019-02-27 12:34:37 BALKRISIND 890.90 659789.0
20 2019-02-27 12:34:37 BANKBARODA 100.60 6876303.0
21 2019-02-27 12:34:37 BANKINDIA 81.15 6643359.0
22 2019-02-27 12:34:36 BATAINDIA 1298.00 492525.0
23 2019-02-27 12:34:37 BEL 80.40 8375060.0
24 2019-02-27 12:34:37 BEML 837.90 2072199.0
25 2019-02-27 12:34:36 BERGEPAINT 297.00 564332.0
26 2019-02-27 12:34:36 BHARATFIN 921.05 271384.0
27 2019-02-27 12:34:36 BHARATFORG 510.85 1321287.0
28 2019-02-27 12:34:37 BHARTIARTL 316.25 2660313.0
29 2019-02-27 12:34:37 BHEL 65.05 3227136.0
... ... ... ... ...
8410 2019-02-27 12:35:56 RPOWER 10.50 30267776.0
8411 2019-02-27 12:35:56 SAIL 48.45 12045549.0
8412 2019-02-27 12:35:56 SBIN 268.10 13249778.0
8413 2019-02-27 12:35:56 SHREECEM 16861.30 24079.0
8414 2019-02-27 12:35:56 SIEMENS 986.80 168896.0
8415 2019-02-27 12:35:56 SRTRANSFIN 1132.40 813839.0
8416 2019-02-27 12:35:56 STAR 412.65 710360.0
8417 2019-02-27 12:35:56 SUNPHARMA 441.75 9114905.0
8418 2019-02-27 12:35:56 SUNTV 604.80 2645366.0
8419 2019-02-27 12:35:56 TATACOMM 582.05 745842.0
8420 2019-02-27 12:35:56 TATAGLOBAL 191.55 1080853.0
8421 2019-02-27 12:35:56 TATAMOTORS 177.80 12885577.0
8422 2019-02-27 12:35:56 TATAMTRDVR 87.95 2138988.0
8423 2019-02-27 12:35:56 TATAPOWER 65.80 3032921.0
8424 2019-02-27 12:35:56 TATASTEEL 502.50 5517049.0
8425 2019-02-27 12:35:56 TCS 2046.40 2315885.0
8426 2019-02-27 12:35:56 TECHM 823.75 2428864.0
8427 2019-02-27 12:35:56 TITAN 1025.30 960611.0
8428 2019-02-27 12:35:56 TORNTPHARM 1823.75 134955.0
8429 2019-02-27 12:35:56 TORNTPOWER 238.90 901773.0
8430 2019-02-27 12:35:56 TV18BRDCST 33.05 2314156.0
8431 2019-02-27 12:35:56 TVSMOTOR 457.40 993427.0
8432 2019-02-27 12:35:56 UBL 1338.50 203388.0
8433 2019-02-27 12:35:56 ULTRACEMCO 3863.20 401531.0
8434 2019-02-27 12:35:56 UPL 864.00 1799297.0
8435 2019-02-27 12:35:56 VEDL 164.60 6466586.0
8436 2019-02-27 12:35:56 VOLTAS 543.25 1388822.0
8437 2019-02-27 12:35:56 WIPRO 376.90 4978421.0
8438 2019-02-27 12:35:56 YESBANK 224.45 42431769.0
8439 2019-02-27 12:35:56 ZEEL 462.50 7847558.0
This is the view of Dataframe that I generate.
symbol open high low close volume
timestamp
2019-02-27 12:29:00 ACC 1431.50 1432.15 1431.25 1431.85 1091.0
2019-02-27 12:29:00 ADANIENT 131.80 131.90 131.75 131.80 4516.0
2019-02-27 12:29:00 ADANIPORTS 326.15 326.55 326.00 326.40 6476.0
2019-02-27 12:29:00 ADANIPOWER 47.80 47.80 47.50 47.70 71156.0
2019-02-27 12:29:00 AJANTPHARM 999.00 999.00 998.10 998.10 245.0
2019-02-27 12:29:00 ALBK 46.55 46.70 46.55 46.65 11483.0