Q/KDB+: Percentile function excluding nulls

Question

What's the least verbose way to express percentile function (not to be confused with percentile) in Q, excluding nulls?

I have:

q)x:0N 1 2 0N 2 1 5
q)@[count[x]#0Nf;i;:;(1%count i)*1+rank x i:where not null x]
0n 0.2 0.6 0n 0.8 0.4 1

Problem with the rank above is that ties actually don't end up with the equal probability/percentile value.

score 1 · Answer 1 · answered Apr 18 '17 at 13:06

Although I don't think this is the most optimal solution, but it should solve the issue:

{   
  X: x where not null x; 
  grouped: group asc X;
  firstRank: first each value grouped;
  quantiles: (key grouped)! firstRank%count X;
  quantiles x
}[0N 1 2 0N 2 1 5]

The code

Filters out nulls from input array
Sorts array in ascending order and groups it by every element. Which gives map of the next structure 1 2 5 ! (0 1; 2 3; 4)
Gets the first index by the key: 1 2 5 ! 0 2 4
Gets quantile function value based on (3)
Maps input array elements to corresponding quantile

thanks @Anton, I've played in the meantime as well. Your solution slightly differs from Wikipedia examples, but is nevertheless one of the two fastest. The other fastest solution has slightly lower memory footprint — Daniel Krizian, Apr 18 '17 at 13:47

score 0 · Answer 2 · answered Apr 18 '17 at 13:43

I've compared several approaches (including prank4 from the other answer):

prank1:{
  n:asc x where not null x;
  (sums[count each group n]%count n) @ x
  }
prank2:{
  p:(1+(asc n) bin n)%count n:x i:where not null x;
  @[count[x]#0Nf;i;:;p]
  }
prank3:{@[((1+til[count i])%count i)@last each group asc i:x where not null x;x]}

prank4:{   
  X: x where not null x; 
  grouped: group asc X;
  firstRank: first each value grouped;
  quantiles: (key grouped)! firstRank%count X;
  quantiles x
  }

Check that output is consistent to the nearest rank method of percentile calculation, except prank4:

prank1 0N 1 2 0N 2 1 5 / 0n 0.4 0.8 0n 0.8 0.4 1

Compare timings and memory footprint:

x:10000000?0N,til 500
\ts prank1 x  /  494 402661632
\ts prank2 x  / 3905 671088960
\ts prank3 x  /  552 536879392
\ts prank4 x  /  496 533741888
prank2[x]~prank1 x / 1b
prank1[x]~prank3 x / 1b
prank1[x]~prank4 x / 0b

Q/KDB+: Percentile function excluding nulls

2 Answers2