KDB+\q: How to fill regularly spaced time series?

Question

I am trying to implement .ml.filltab[] located here on a table of timeseries data composed of regularly interspaced temporal aggregations i.e.:

second     |  amount price 
-----------|----------------
02:46:01   |  54     9953.5
02:46:04   |  150          
02:46:05   |         9954.5

Applying the .ml.filltab[] function does not update the index so the table incorrectly fills values without the appropriate index. i.e.

second     |  amount price 
-----------|----------------
02:46:01   |  54     9953.5
02:46:04   |  150    9953.5    
02:46:05   |  150    9953.5

As opposed to correctly filling in the index as follows:

second     |  amount price 
-----------|----------------
02:46:01   |  54     9953.5
02:46:02   |  54     9953.5
02:46:03   |  54     9953.5
02:46:04   |  150    9953.5    
02:46:05   |  150    9954.5

In addition to the above It also requires a column by which to group the data i.e. a `sym column which is unnecessary when only aggregating a single series.

How does one run a fill forward operation on a table accounting for missing index? Thanks

score 5 · Accepted Answer · answered Sep 03 '19 at 02:23

One way of doing this would be to create a secondary empty table with the same schema where the second column has all of the missing time points. To do this create a list of all the times you are interested in. One way of doing this would be to take the first and last element and do the following:

q)f:exec first second from t
q)f
02:46:01
q)l:exec last second from t
q)l
02:46:05
q)s:f+`second$til 1+`long$l-f
q)s
02:46:01 02:46:02 02:46:03 02:46:04 02:46:05

Note that the second type needed to be casted to a long type to work with the til function, then cast back to a second type. Also we add 1 to ensure the last time (02:46:05) is not missed out.

We can create an empty table with the same schema by doing the following:

q)et:select by second:s from 0#t
q)et
second  | amount price
--------| ------------
02:46:01|
02:46:02|
02:46:03|
02:46:04|
02:46:05|

The 0# is a quick way of creating an empty table with the same schema as the original table t. Now an lj can be used to merge the two tables together, and fills will then forward fill in any nulls:

q)et lj t
second  | amount price
--------| -------------
02:46:01| 54     9953.5
02:46:02|
02:46:03|
02:46:04| 150
02:46:05|        9954.5
q)fills et lj t
second  | amount price
--------| -------------
02:46:01| 54     9953.5
02:46:02| 54     9953.5
02:46:03| 54     9953.5
02:46:04| 150    9953.5
02:46:05| 150    9954.5

score 4 · Answer 2 · answered Sep 03 '19 at 18:26

You really have two separate steps here: filling null values in your table, and generating a time series with regularly spaced intervals from your table.

You already have the means to fill in null values. To generate a time series you can use the as-of join function.

Given your (already filled) table

q)show t:([second:02:46:01 02:46:04 02:46:05]amount:45 150 150;price:9953.5 9953.5 9954.5)
second  | amount price
--------| -------------
02:46:01| 45     9953.5
02:46:04| 150    9953.5
02:46:05| 150    9954.5

you can generate a time series by first defining an auxiliary table with the desired sample times

q)show ts:([]second:02:46:01+til 5)
second
--------
02:46:01
02:46:02
02:46:03
02:46:04
02:46:05

and then as-of joining these two tables

q)aj[`second;ts;t]
second   amount price
----------------------
02:46:01 45     9953.5
02:46:02 45     9953.5
02:46:03 45     9953.5
02:46:04 150    9953.5
02:46:05 150    9954.5

You can easily extend this method to generate such time series when your table contains multiple instruments.

KDB+\q: How to fill regularly spaced time series?

2 Answers2