5

I have a main dataframe named raw which looks like this:

tab:([]date:2018.02.05 2018.02.05 2018.02.06 2018.02.06;time:01:30:25.000 02:30:45.000 04:15:15.000 02:15:15.000;vol:50 55 64 12; name:`A`B`B`A)

date           time         vol     name   
2018.02.05     1:30:25      50       A
2018.02.05     2:30:45      55       B
2018.02.06     4:15:15      64       B
2018.02.06     2:15:15      12       A

I need to create a new table dependent on conditions like:

Between two certain dates I need to find times when cumulative vol is 100 for name B within a two hour period.

The logic which I think should work: arrange the data by ascending order of time. Add all the vol by name= `B for time within (time[i]: time[i]+2hrs). If the cum vol > 100, return the time intervals and corresponding date. Continue with i +1 . I'm new to kdb so I'm facing difficulties implementing it.

Sample output:

time1          time2         date1         date2
1:30:00        3:30:00       2018.02.05    2018.02.05
23:00:00       1:00:00       2018.02.05    2018.02.06

Any leads on this is appreciated. Thanks

  • What would be your expected output in this case and what have you tried? It's easier for others to help if we know this information and if you include an example dataset that can be easily copied to a q session. – Thomas Smyth - Treliant Mar 01 '18 at 10:23
  • Useful links related to Thomas' comment: https://stackoverflow.com/help/mcve http://code.kx.com/q/cookbook/faq-listbox/#how-to-post-test-data-on-the-k4-list – Jonathon McMurray Mar 01 '18 at 10:35
  • 1
    @ThomasSmyth I've edited my ques. Let me know if there is any more lack of clarity. My input data is just an extension of the sample I provided. –  Mar 01 '18 at 10:46

2 Answers2

1

I believe a solution to your problem can be reached using aj

Initially as you pointed out, the table should be sorted by time

`time xasc `tab;

Then, a cumulative sum of the volumes should be created, using sums

tab:update cumvol:sums vol by name from tab

Then using aj - to get the cumulative sums of the volumes that are not within the 2 hour period for each time.

aj[`name`time;tab;select time:time+02:00,name,cumvol2:cumvol from tab]

We can then do cumvol - cumvol2 to get the total volume in each 2 hour period

tab:select time, name, runningvol:cumvol-0^cumvol2 from 
aj[`name`time;tab;select time:time+02:00,name,cumvol2:cumvol from tab]

Then a simple select statement can get the times where the cumvol is greater than 100

select time,time+02:00 from tab where runningvol>100

An improvement that could be added to this would be to add a grouped attribute on the 2nd table in the aj. Another improvement to this would be to format the dates and times into a single timestamp or datetime.

More info on the functions aj and sums can be found here:

http://code.kx.com/q/ref/joins/#aj-aj0-asof-join

http://code.kx.com/q/ref/arith-integer/#sums

1

You can also use a window join wj1 for this. Given the example table:

t:`time xasc ([]time:(1000?2018.02.05 2018.02.06)+1000?24:00:00;sym:1000?`A`B`C;vol:1000?10);

The following function aggregates vol in 2 hour windows relative to the timestamp and is passed a table t, start date s, end date e and name n.

fw:{[t;s;e;n]
  r:@[;`sym;`p#]`sym`time xasc select from t where time.date within(s;e),sym=n;
  :select from wj1[r[`time]-/:02:00 00:00;`time;r;(r;(sum;`vol))] where vol>100;
 };

Running for the name/sym B gives:

q)fw[t;2018.02.05;2018.02.06;`B]
time                          sym vol
-------------------------------------
2018.02.05D18:12:39.000000000 B   104
2018.02.05D18:35:47.000000000 B   101
2018.02.05D18:40:17.000000000 B   102
...

It can also be modified to give all results for all names/syms:

fw1:{[t;s;e]
  r:@[;`sym;`p#]`sym`time xasc select from t where time.date within(s;e);
  :select from wj1[r[`time]-/:02:00 00:00;`sym`time;r;(r;(sum;`vol))] where vol>100;
 };

Running without a name/sym this time:

q)fw1[t;2018.02.05;2018.02.06]
time                          sym vol
-------------------------------------
2018.02.05D02:01:36.000000000 A   106
2018.02.05D02:52:23.000000000 A   103
2018.02.05D03:06:51.000000000 A   105
...

Although this approach is less efficient than using aj it still illustrates how you may achieve this with window joins.

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36