5

General Task

A table consists of three columns (time, key, value). The task is to calculate a running difference for each key. So, from input

---------------
| time | key | value |
---------------
| 1    | A   | 4     |
| 2    | B   | 1     |
| 3    | A   | 6     |
| 4    | A   | 7     |
| 5    | B   | 3     |
| 6    | B   | 7     |

it is desired to get

----------------------
| key | value | delta |
----------------------
| A   | 4     |  0    |
| B   | 1     |  0    |
| A   | 6     |  2    |
| A   | 7     |  1    |
| B   | 3     |  2    |
| B   | 7     |  4    |

Approaches

  1. runningDifference function. Works, if the key is fixed. So we can

    select *, runningDifference(value) from
        (SELECT key, value from table where key = 'A' order by time)
    

    Note that subquery is necessary here. This solution suffers when you want to get this for different keys

  2. groupArray.

    select key, groupArray(value) from
        (SELECT key, value from table order by time)
        group by key
    

    So, now we get a key and an array of elements with this key. Good.

    But how to calculate a sliding difference? If we could do that, then ARRAY JOIN would lead us to a result.

    Or we can even zip the array with itself and then apply lambda (we have arrayMap for that) but... we don't have any zip alternative.

Any ideas? Thanks in advance.

3 Answers3

8

Solution with arrays:

WITH 
   groupArray(value) as time_sorted_vals,
   arrayEnumerate(time_sorted_vals) as indexes,
   arrayMap( i -> time_sorted_vals[i] - time_sorted_vals[i-1], indexes) as running_diffs
SELECT 
   key,
   running_diffs
FROM 
    (SELECT key, value from table order by time)
GROUP by key

Other option (doing sort inside each group separately, which is more optimal in a lot of cases)

WITH
   groupArray( tuple(value,time) ) as val_time_tuples,
   arraySort( x -> x.2, val_time_tuples ) as val_time_tuples_sorted,
   arrayMap( t -> t.1, indexes) as time_sorted_vals,
   arrayEnumerate(time_sorted_vals) as indexes,
   arrayMap( i -> time_sorted_vals[i] - time_sorted_vals[i-1], indexes) as running_diffs
SELECT 
   key,
   running_diffs
FROM 
   time
GROUP by key

and you can apply ARRAY JOIN on the result afterward.

vladimir
  • 13,428
  • 2
  • 44
  • 70
filimonov
  • 1,666
  • 1
  • 9
  • 20
  • Hi, I was wondering how would the query look like if in the result we also wanted to include `time` column, meaning: `| time | key | value | delta |`? – lukasm Mar 16 '21 at 09:22
  • This answer was posted years ago, for today Sep 29th, 2021, we can use `arrayDifference` instead of `arrayMap`. And we can `ARRAY JOIN` so that we can get a tabulated result instead of a nested array. That is: – Woods Chen Sep 29 '21 at 00:45
  • see my posted answer. – Woods Chen Sep 29 '21 at 01:03
1

This question was posted years ago, for today, Sep 29th, 2021, we can use arrayDifference instead of arrayMap. And we can ARRAY JOIN so that we can get a tabulated result instead of a nested array.

SELECT key, sorted_time, time_sorted_vals, running_diffs
FROM (
WITH
   groupArray( tuple(value,time) ) as val_time_tuples,
   arraySort( x -> x.2, val_time_tuples ) as val_time_tuples_sorted,
   arrayMap( t -> t.1, val_time_tuples_sorted) as time_sorted_vals,
   arrayMap( t -> t.2, val_time_tuples_sorted) as sorted_time,
   arrayDifference(time_sorted_vals) as running_diffs
SELECT 
   key,
   sorted_time,
   time_sorted_vals,
   running_diffs
FROM 
   table_name
GROUP by key)
ARRAY JOIN sorted_time, time_sorted_vals, running_diffs

The only restriction is that the value column should not be of nullable types.

Woods Chen
  • 574
  • 3
  • 13
0

Lately I've also encountered the problem and Clickhouse offers function arrayDifference.

WITH 
   groupArray(value) as vals
   arrayDifference(vals) as running_diffs
SELECT 
   key,
   running_diffs
FROM 
    (SELECT key, value from table order by time)
GROUP by key
phodina
  • 1,341
  • 1
  • 12
  • 25
  • Hello, yeah it's been a long time since the question was asked. Let me point that your answer doesn't solve the original question: your query prodice an array of diffs per each key, while it is desired to get the overall order persisited, a simple array join would not give you such a result. – Konstantin Sekeresh May 21 '20 at 08:32