How to optimize a batch pivotization?

Question

I have a datetime list (which for some reason I call it column date) containing over 1k datetime.

adates:2017.10.20T00:02:35.650 2017.10.20T01:57:13.454 ...

For each of these dates I need to select the data from some table, then pivotize by a column t i.e. expiry, add the corresponding date datetime as column to the pivotized table and stitch together the pivotization for all the dates. Note that I should be able to identify which pivotization corresponds to a date and that's why I do it one by one:

fPivot:{[adate;accypair]
     t1:select from volatilitysurface_smile where date=adate,ccypair=accypair;
     mycols:`atm`s10c`s10p`s25c`s25p;
     t2:`t xkey 0!exec mycols#(stype!mid) by t:t from t1;
     t3:`t xkey select distinct t,tenor,xi,volofvol,delta_type,spread from t1;
     result:ej[`t;t2;t3];
     :result}

I then call this function for every datetime adates as follows:

raze {[accypair;adate] `date xcols update date:adate from fPivot[adate;accypair] }[`EURCHF] @/: adates;

this takes about 90s. I wonder if there is a better way e.g. do a big pivotization rather than running one pivotization per date and then stitching it all together. The big issue I see is that I have no apparent way to include the date attribute as part of the pivotization and the date can not be lost otherwise I can't reconciliate the results.

The data you have provided in `adates` is of type datetime not timestamp. A timestamp in kdb looks like `2017.11.28D12:55:47.354335000`. Note the `D` separator between date and time, compared to the `T` separator in datetime types. — Thomas Smyth - Treliant, Nov 28 '17 at 12:56

score 1 · Answer 1 · answered Nov 28 '17 at 13:14

1

If you havent been to the wiki page on pivoting then it may be a good start. There is a section on a general pivoting function that makes some claims to being somewhat efficient:

One user reports:

This is able to pivot a whole day of real quote data, about 25 million quotes over about 4000 syms and an average of 5 levels per sym, in a little over four minutes.

As for general comments, I would say that the ej is unnecessary as it is a more general version of ij, allowing you to specify the key column. As both t2 and t3 have the same keying I would instead use:

t2 ij t3

Which may give you a very minor performance boost.

answered Nov 28 '17 at 13:14

Thomas Smyth - Treliant

4,993
6
25
36

Hi Thomas, thank you for the answer. Yes I have been to that wiki that's the way I build the pivotization. I tried once using that function but it is very complex specially figuring out what f and g should be but indeed it looks promissing. Your answer would be complete if you could show how to call that general purpose pivot for the OP :) – SkyWalker Nov 28 '17 at 16:27
I can try give it a go if you give an example `volatilitysurface_smile` table or even just the schema for the table. Editing it into the question may help anyone else coming along to formulate an answer too. – Thomas Smyth - Treliant Nov 28 '17 at 16:30

score 1 · Accepted Answer · answered Nov 29 '17 at 12:54

OK I solved the issue by creating a batch version of the pivotization that keeps the date (datetime) table field when doing the group by bit needed to pivot i.e. by t:t from ... to by date:date,t:t from .... It went from 90s down to 150 milliseconds.

fBatchPivot:{[adates;accypair]
    t1:select from volatilitysurface_smile where date in adates,ccypair=accypair;
    mycols:`atm`s10c`s10p`s25c`s25p;
    t2:`date`t xkey 0!exec mycols#(stype!mid) by date:date,t:t from t1;
    t3:`date`t xkey select distinct date,t,tenor,xi,volofvol,delta_type,spread from t1;
    result:0!(`date`t xasc t2 ij t3);
    :result}

Looks good, you could accept it as the answer to wrap up this question. — Thomas Smyth - Treliant, Nov 29 '17 at 19:54

How to optimize a batch pivotization?

2 Answers2