How to sample from KDB table to reduce data before querying?

Question

I have a table of tick data representing prices of various financial instruments up to millisecond precision. Problem is, there are over 5 billion entries, and even the most basic queries takes several minutes.

I only need data with a precision of up to 1 second - is there an efficient way to sample the table so that the precision is reduced to roughly 1 second prior to querying? This should dramatically cut the amount of data and hence execution time.

So far, as a quick hack I've added the condition where i mod 2 = 0 to my query, but is there a better way?

A bit of Google searching suggests that there's a tool called kProbe that can be used for sampling data. Have you checked that out? — Robert Harvey, Mar 17 '14 at 18:17

score 3 · Accepted Answer · edited Nov 28 '17 at 00:09

3

The best way to bucket time data is with xbar

q)select last price, sum size by 10 xbar time.minute from trade where sym=`IBM
minute| price size
------| -----------
09:30 | 55.32 90094
09:40 | 54.99 48726
09:50 | 54.93 36511
10:00 | 55.23 35768 
...

more info http://code.kx.com/q/ref/arith-integer/#xbar

edited Nov 28 '17 at 00:09

Thomas Smyth - Treliant

4,993
6
25
36

answered Mar 17 '14 at 18:57

nightTrevors

639
4
11
24

1

In addition to nightTrevor's answer, the time taken is most likely taken up by I/O bottleneck. depending on how your table is splayed (date, sym?) you will find that optimising your where clause will improve performance. For date-partitioned tables, for example, the optimal is date then sym then time. – Manish Patel Mar 18 '14 at 08:47
This is a good point, unlike most of kdb/q, where clauses are actually evaluated left to right, not left OF right, so you want your most constraining conditions first – nightTrevors Mar 18 '14 at 11:59

How to sample from KDB table to reduce data before querying?

1 Answers1