I want to compute Effective spread on bond market.(It is a little bit complex)
I have 2 data frame first is Trade and the Second is quote:
Trade(Transaction) is look like this:
Id Date Index Time Volume Price
0 56033738624803469 2017-05-21 1 10:32:28 1 935049.7
1 56033738624803469 2017-05-21 1 11:54:41 10 919410.0
2 56033738624803469 2017-05-21 2 12:25:46 1 919410.0
3 56033738624803469 2017-05-21 1 11:48:08 5 954999.0
4 56033738624803469 2017-05-21 2 11:48:43 6 954999.0
5 56033738624803469 2017-05-21 1 10:00:49 50000 959900.0
and quote(Limit Order Book) is like this:
BuyCount BuyPrice BuyTime BuyVol Date SID row SellTime SellPrice SellVol SellCount row.1 Quote-Spread MidQuote
1 930100 10:30:05 100 2017-05-21 56033738624803469 1 10:32:27 939999 5700 2 1 0.010643 935049.5
1 930000 10:30:05 260 2017-05-21 56033738624803469 2 06:12:52 940000 2817 1 2 0.010753 935000.0
1 928000 10:30:05 2892 2017-05-21 56033738624803469 3 06:12:52 946800 458 1 3 0.020259 937400.0
2 900000 10:30:05 35000 2017-05-21 56033738624803469 4 06:12:52 946980 6000 1 4 0.052200 923490.0
1 874001 10:30:05 100 2017-05-21 56033738624803469 5 06:12:52 946999 6000 1 5 0.083522 910500.0
Effective Spread is calculated like this:
market order: 2×|pt−mt| (absolute differention)
Where pt is the price for the transaction at time t , and mt is the mid quote at time t−ε (just before the trade).
I should first merge this two data by time and then calculate the Effective Spread; for example: for data that been traded(First data frame) at time 10:32:28 this formula should be like this:
Effective Spread = 2*(935049.7-935049.5).abs()
Note 1: the mid quote of row == 1 is considered and other quotes should be neglected.because markets actual trade(best buy and best sell price is in row 1)
My question is how can i merge this two data then i could calculate this formula.
Edit:
I have found this link but i my data is a little bit different.
http://www.vincentgregoire.com/introduction-to-empirical-market-microstructure-in-python/
I could not understand how to apply this to my data frame (because of Note 1)