4

In SQL, aggregate functions sum(x), avg(x), max(x), min(x) will return NULL when x is empty or contains only NULL values.

In KDB, sum and avg return zero, while max and min return (+ or -) infinity.

To get the desired behavior of returning null I have come up with 2 options:

(A) Remove the nulls, and check if the result is empty before aggregating as usual

nsum:{[x] x: x where not null x; $[0=count x; 0N; sum x]}

(B) Create a new aggregate which checks each item for null before accumulating
Note that we still have to check for an empty list.

nsum:{[z] $[0=count z; 0N;  {[x;y] $[null x; y; null y; x; x+y]}/ z ]}

Is there a better (faster and/or more convenient) way than either A or B?

Will method (B) perform any better than method (A) ?

Method (A) has the advantage of working for any aggregate function. Method (B) does not work for "average", because there is no pairwise function which will produce a running average (needs to know how many items were already applied).

A related question: is there a faster way to check for an empty list than 0=count x, for example 0= type first x?

THANKS!

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
JSLover
  • 165
  • 1
  • 10

2 Answers2

2

You could use the all operator. Also you could (perhaps should) return a null based on the type of the list.

f:{$[all null x;(neg type x)$"";sum x]}
q)f `int$()
0N
q)f 0n 0n 0n 0n
0n

To measure the performance use the \t (timer) or \ts (time and space). http://code.kx.com/q/ref/syscmds/#t-timer

f1:{$[all null x;(neg type x)$"";sum x]}
fa:nsum:{[x] x: x where not null x; $[0=count x; 0N; sum x]}
fb:{[z] $[0=count z; 0N; {[x;y] $[null x; y; null y; x; x+y]}/[z] ]}

allnull:1000000#0N
withnull:10000000?(1000#0N,til 10)
withoutnull:10000000?10

\ts do[10;f withnull]
612 16777504
..
..

You will find that method B (fb) is the slowest. f1 is quicker than fa (apart for the allnull list case, speeds are comparable) and uses less space.

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
Geoffrey Absalom
  • 1,815
  • 15
  • 16
2

For min/max you can do some trickery with adding and subtracting 1's with int/long lists to ensure nulls are returned. This was recently mentioned at kxCon2016 by Attila Vrabecz. It's a manipulation of how/where null and infinity appear on the "number line". They're not very general though and are specific to each operation:

q)a:`int$()
q)b:1 2 3i
q)c:0N 0Ni

q)-1+max 1+a
0N
q)-1+max 1+b
3
q)-1+max 1+c
0N

q)1+min a-1
0N
q)1+min b-1
1
q)1+min c-1
0N
terrylynch
  • 11,844
  • 13
  • 21