1

I have a table of tick data representing prices of various financial instruments up to millisecond precision. Problem is, there are over 5 billion entries, and even the most basic queries takes several minutes.

I only need data with a precision of up to 1 second - is there an efficient way to sample the table so that the precision is reduced to roughly 1 second prior to querying? This should dramatically cut the amount of data and hence execution time.

So far, as a quick hack I've added the condition where i mod 2 = 0 to my query, but is there a better way?

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
mchen
  • 9,808
  • 17
  • 72
  • 125

1 Answers1

3

The best way to bucket time data is with xbar

q)select last price, sum size by 10 xbar time.minute from trade where sym=`IBM
minute| price size
------| -----------
09:30 | 55.32 90094
09:40 | 54.99 48726
09:50 | 54.93 36511
10:00 | 55.23 35768 
...

more info http://code.kx.com/q/ref/arith-integer/#xbar

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
nightTrevors
  • 639
  • 4
  • 11
  • 24
  • 1
    In addition to nightTrevor's answer, the time taken is most likely taken up by I/O bottleneck. depending on how your table is splayed (date, sym?) you will find that optimising your where clause will improve performance. For date-partitioned tables, for example, the optimal is date then sym then time. – Manish Patel Mar 18 '14 at 08:47
  • This is a good point, unlike most of kdb/q, where clauses are actually evaluated left to right, not left OF right, so you want your most constraining conditions first – nightTrevors Mar 18 '14 at 11:59