I am desperately trying to reproduce the classical Pandas
example of rolling joins, where quotes
data is merged with trade
data.
See here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html
Here is the data in data.table
format:
trades <- data.table(time = c('2016-05-25 13:30:00.023',
'2016-05-25 13:30:00.038',
'2016-05-25 13:30:00.048',
'2016-05-25 13:30:00.048',
'2016-05-25 13:30:00.048'),
ticker = c('MSFT','MSFT','GOOG','GOOG','AAPL'),
price = c(51.95,51.95,720.77,720.92,98.0),
quantity = c(75,155,100,100,100))
> trades
time ticker price quantity
1: 2016-05-25 13:30:00.023 MSFT 51.95 75
2: 2016-05-25 13:30:00.038 MSFT 51.95 155
3: 2016-05-25 13:30:00.048 GOOG 720.77 100
4: 2016-05-25 13:30:00.048 GOOG 720.92 100
5: 2016-05-25 13:30:00.048 AAPL 98.00 100
and quotes
quotes <- data.table(time = c('2016-05-25 13:30:00.023',
'2016-05-25 13:30:00.023',
'2016-05-25 13:30:00.030',
'2016-05-25 13:30:00.041',
'2016-05-25 13:30:00.048',
'2016-05-25 13:30:00.049',
'2016-05-25 13:30:00.072',
'2016-05-25 13:30:00.075'),
ticker = c('GOOG','MSFT','MSFT','MSFT','GOOG','AAPL','GOOG','MSFT'),
bid = c(720.50, 51.95, 51.97, 51.99, 720.5,97.99,720.5,52.01),
ask = c(270.93,51.96,51.98,52.00,720.93,98.01,720.88,52.03))
> quotes
time ticker bid ask
1: 2016-05-25 13:30:00.023 GOOG 720.50 270.93
2: 2016-05-25 13:30:00.023 MSFT 51.95 51.96
3: 2016-05-25 13:30:00.030 MSFT 51.97 51.98
4: 2016-05-25 13:30:00.041 MSFT 51.99 52.00
5: 2016-05-25 13:30:00.048 GOOG 720.50 720.93
6: 2016-05-25 13:30:00.049 AAPL 97.99 98.01
7: 2016-05-25 13:30:00.072 GOOG 720.50 720.88
8: 2016-05-25 13:30:00.075 MSFT 52.01 52.03
What I would like to do is merge the trades data with the quotes data in the following way
- for each trade, match the closest previous quote possible
- the matched quote must be within 10 millisecond
- exact matches should not occur.
The output (which is the same as in the Pandas tutorial) should be
time ticker price quantity bid ask
1: 2016-05-25 13:30:00.023 MSFT 51.95 75 NA NA
2: 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
3: 2016-05-25 13:30:00.048 GOOG 720.77 100 NA NA
4: 2016-05-25 13:30:00.048 GOOG 720.92 100 NA NA
5: 2016-05-25 13:30:00.048 AAPL 98.00 100 NA NA
Indeed, you can see that the only possible quote match is for the second trade at 2016-05-25 13:30:00.038
, because the closed (previous) quote occurs at 2016-05-25 13:30:00.030
so this is within 10ms (and not an exact match).
Despite my trials, I was not able to reproduce this in data.table
. Any ideas?
Thanks!