0

I'm new in kdb/q. And the following is my question. Really hope someone who experts in kdb can help me out.

I have two tables. Table t1 has two attributes: tp_time and id, which looks like:

tp_time                   id
------------------------------
2018.06.25T00:07:15.822    1
2018.06.25T00:07:45.823    3
2018.06.25T00:09:01.963    8
...
...

Table t2 has three attributes: tp_time, id, and price.

For each id, it has lots of price at different tp_time. So the table t2 is really large, which looks like the following:

tp_time                   id      price
----------------------------------------
2018.06.25T00:05:99.999    1      10.87
2018.06.25T00:06:05.823    1      10.88
2018.06.25T00:06:18.999    1      10.88
...
...
2018.06.25T17:39:20.999    1      10.99 
2018.06.25T17:39:23.999    1      10.99
2018.06.25T17:39:24.999    1      10.99
...
...
2018.06.25T01:39:39.999    2      10.99 
2018.06.25T01:39:41.999    2      10.99
2018.06.25T01:39:45.999    2      10.99
...
...

What I try to do is for each row in Table t1, find its price at the nearest time and its price at approximately 5 seconds later. For example, for the first row in table t1:

2018.06.25T00:07:15.822 1

The price at nearest time is 10.87 and the price at around 5 seconds later is 10.88. And my expected output table looks like the following:

tp_time                   id   price_1      price_2
----------------------------------------------------
2018.06.25T00:07:15.822    1    10.87        10.88
2018.06.25T00:07:45.823    3    SOME_PRICE   SOME_PRICE
2018.06.25T00:09:01.963    8    SOME_PRICE   SOME_PRICE
...
...

The thing is I cannot join t1 and t2 because table t2 is so large and I will kill the server. I've try something like ...where tp_time within(time1, time2). But I'm not sure how to deal with the time1 and time2 varibles.

Could someone gives me some helps on this questions? Thanks so much!

nyi
  • 3,123
  • 4
  • 22
  • 45
Chenrui Su
  • 145
  • 1
  • 1
  • 9

1 Answers1

1

I'll recommend organizing the table t1 by applying the proper attributes so that when you join the tables, it will generate the results quickly.

Since you are looking for the prevailing price and price after 5 seconds, You will need wj for this.

the general syntax is :

wj[w;c;t;(q;(f0;c0);(f1;c1))]

w - begin and end time
t & q - unkeyed tables; q should be sorted by `id`time with `p# on id
c- names of the columns to be joined
f0,f1 - aggregation functions

In your case t2 should be sorted by `id`time with `p# on id

q)t2:update `g#id from `id`tp_time xasc ([] tp_time:`time$10:20:30 + asc -10?10 ; id:10?3 ;price:10?10.)
q)t1:([] tp_time:`time$10:20:30 + asc -3?5 ; id:1 1 1 )

q)select from t2 where id=1
tp_time         id  price
10:20:31.000    1   4.410662
10:20:32.000    1   5.473385
10:20:38.000    1   1.247049

q)wj[(`second$0 5)+\:t1.tp_time;`id`tp_time;t1;(t2;(first;`price);(last;`price))]

tp_time        id   price       price
10:20:30.000    1   4.410662    5.473385  
10:20:31.000    1   4.410662    5.473385
10:20:34.000    1   5.473385    1.247049   //price at 32nd second & 38th second
nyi
  • 3,123
  • 4
  • 22
  • 45
  • Thank you so much! That's extremely helpful. Since the original table t2 is in kdb server, would the update command on t2 kill the server...? – Chenrui Su Jun 27 '18 at 23:53
  • What is the meta of t2 table? Are you passing a function to the server or bringing the data to your local q session? – nyi Jun 28 '18 at 00:00
  • Is the t2 table date partitioned, splayed or stored as binary? – nyi Jun 28 '18 at 00:01
  • Thanks for your reply! I don't have access to the data right now so I can't show you the meta... i used \l to load data to my local q session. And I'm not 100% sure but I think t2 table date is not being partitioned and splayed... – Chenrui Su Jun 28 '18 at 00:19
  • What is the record count of that table? If it is binary table on the server then it should be fine to run the update command. Please check out the aj and wj functions on kx wki, it explains how the data needs to be sorted when using the joins for quicker results. – nyi Jun 28 '18 at 00:23
  • I'm not quite sure. But last time I run: "count select from t2" without a where clause to filter a tp_time range, it kills the server.. anw, Thanks so much for your helps! – Chenrui Su Jun 28 '18 at 00:27
  • I think it's date partitioned. In that case you should use `select count sym from t2 where date =.z.d` – nyi Jun 28 '18 at 00:32
  • Check this one out. https://stackoverflow.com/questions/25629666/command-to-check-partitioned-directory-in-kdb – nyi Jun 28 '18 at 00:47
  • Hi, nyi. It seems like the meta t2 gives me id has integer data type and 'g' on attribute. is that mean the id has been grouped already and I don't need to perform the update command before wj...? – Chenrui Su Jun 28 '18 at 12:34
  • As per 'wj' wiki - Note that window join with multiple symbols should be only used with a p#sym like a schema (typical RTD-like g# gives undefined results) g# is generally an attribute applied in memory, so I can't comment on you should explicitly update the schema to p# (as you last time killed the server while running the query). – nyi Jun 28 '18 at 19:06
  • I'll recommend you to check with the team who is maintaining the data about the current hdb setup – nyi Jun 28 '18 at 19:12