KDB performance: searching first item quicky

Question

I have a sorted list v of around 20K items. I want to split it into 2 lists at the point where first v[i]>K

N:20000;
v:asc N?100000;     / N random numbers sorted
K:200;              / threshold
v1:v[where v<=K];   / "v<=K" has O(N) complexity, "where" has O(N) too
v2:(count v1) _ v;  / list is sorted, this holds.

Question: how to avoid v<=200, so it does not calculate whole intermediate boolean vector of length N, in other words, does not compare values after first match found? I actually need an index to perform split at. Assume K is found close to the beginning of a list.

This is performance-related question. (NB Ignore time spent on "asc".)

score 6 · Accepted Answer · answered Jan 15 '18 at 13:25

To avoid calculating the list of booleans, you can take advantage of the fact your list is sorted and use binr:

c:v binr K  //42
v1:c # v
v2:c _ v

This greatly improves the speed of operation:

q)\ts:10000 v1:v[where v<=K];v2:(count v1) _ v
680 262608
q)\ts:10000 c:v binr K;v1:c # v;v2:c _ v
75 262560

KDB performance: searching first item quicky

1 Answers1