Summing over a numeric column with moving window of varying size in Snowflake

Question

I have a sample dataset given as follows;

time  |  time_diff  | amount
time1 |  time1-time2 | 1000
time2 |  time2-time3 | 2000
time3 |  time3-time4 | 3000
time4 |  time4-time5 | 4500
time5 |   NULL       | 1000

Quick explanation; first column gives time of transaction, second column gives difference with next row to get transaction interval(in hours), and third column gives money made in a particular transaction. We have sorted the data in ascending order using time column.

Some values are given as;

time  |  time_diff  | amount
time1 |  2.         | 1000
time2 |  3.         | 2000
time3 |  1.         | 3000
time4 |  19.        | 4500
time5 |   NULL       | 1000

The goal is to find the total transaction for a given time, which occurred within 24 hours of that transaction. For example, the output for time1 shd be; 1000+2000+3000=6000. Because if we add the value at time4, the total time interval becomes 25, hence we omit the value of 4500 from the sum.

Example output:

time  |  amount
time1 | 6000       
time2 | 9500         
time3 | 7500         
time4 | 4500        
time5 | 1000

The concept of Mong window sum should work, in my knowledge, but here the width of the window is variable. Thats the challenge I am facing.Can I kindly get some help here?

I don't understand how your timediff column works. Might be easier to give real values for "time" instead? — Simon D, Aug 21 '21 at 12:40

score 0 · Accepted Answer · answered Aug 21 '21 at 09:36

You could ignore the time_diff column and use a theta self-join based on a timestamp range, like this:

WITH srctab AS (    SELECT TO_TIMESTAMP_NTZ('2020-04-15 00:00:00') AS "time", 1000::INT AS "amount"
          UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 00:02:00'), 2000::INT
          UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 00:05:00'), 3000::INT
          UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 00:06:00'), 4500::INT
          UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-16 00:01:00'), 1000::INT
          )
SELECT t1."time", SUM(t2."amount") AS tot
FROM srctab t1
JOIN srctab t2 ON t2."time" BETWEEN t1."time" AND TIMESTAMPADD(HOUR, +24, t1."time")
GROUP BY t1."time"
ORDER BY t1."time";

Minor detail: if your second column gives the time difference with the next row then I'd say the first value should be 10500 (not 6000) because it's only your 5th transaction of 1000 which is more than 24 hours ahead... I'm guessing your actual timestamps are at 0, 2, 5, 6 and 25 hours?

score 0 · Answer 2 · answered Aug 22 '21 at 06:00

Another option might be to use the sliding WINDOW function by tweaking your transactional data to include each hour.

It's perhaps an overkill but might be a useful technique.

Firstly generate a placeholder for each hour using the timestamps. I utilised time_slice to map each timestamp into nice hour blocks and generator with dateadd to back fill each hour putting a zero in where no transactions took place.

So now I can use the sliding window function knowing that I can safely choose the 23 preceding hours.

Copy|Paste|Run

WITH SRCTAB AS (
SELECT  TO_TIMESTAMP_NTZ('2020-04-15 00:00:00') AS TRANS_TS,   1000::INT AS AMOUNT
UNION ALL   SELECT  TO_TIMESTAMP_NTZ('2020-04-15 02:00:00'),  2000::INT
UNION ALL SELECT  TO_TIMESTAMP_NTZ('2020-04-15 05:00:00'), 3000::INT
UNION ALL SELECT TO_TIMESTAMP_NTZ('2020-04-15 06:00:00'),  4500::INT
UNION ALL  SELECT  TO_TIMESTAMP_NTZ('2020-04-16 01:00:00'), 1000::INT
)
SELECT 
    TRANS_TIME_HOUR 
    ,SUM(AMOUNT) OVER ( ORDER BY TRANS_TIME_HOUR ROWS BETWEEN 23 PRECEDING AND 0 PRECEDING ) OVERKILL FROM (
SELECT
    TRANS_TIME_HOUR,
    SUM(AMOUNT) AMOUNT
FROM
    (
    SELECT
        DATEADD(HOUR, NUMBER, TRANS_TS) TRANS_TIME_HOUR,
        DECODE( DATEADD(HOUR, NUMBER, TRANS_TS), TIME_SLICE(TRANS_TS, 1, 'HOUR', 'START'), AMOUNT,0) AMOUNT
    FROM
        SRCTAB,
        (SELECT SEQ4() NUMBER FROM TABLE(GENERATOR(ROWCOUNT => 24)) ) G
)
GROUP BY
    TRANS_TIME_HOUR
)

Summing over a numeric column with moving window of varying size in Snowflake

2 Answers2