4

I am working with a table that contains timeseries data, with a row for each minute for each user.

I want to compute some aggregate functions on a rolling window of N calendar days.

This is achieved via

SELECT
SOME_AGGREGATE_FUN(col) OVER (
        PARTITION BY user_id
        ORDER BY timestamp
        ROWS BETWEEN (60 * 24 * N) PRECEDING AND CURRENT ROW
) as my_col
FROM my_table

However, I am only interested in the result of this at a daily scale.

i.e. I want the window to be computed only at 00:00:00, but I want the window itself to contain all the minute-by-minute data to be passed into my aggregate function.

Right now I am doing this:

WITH agg_results AS (
    SELECT
    SOME_AGGREGATE_FUN(col) OVER (
        PARTITION BY user_id
        ORDER BY timestamp_col
        ROWS BETWEEN (60 * 24 * N) PRECEDING AND CURRENT ROW
    )
    FROM my_table
)

SELECT * FROM agg_results
WHERE 
timestamp_col = DATE_TRUNC('day', "timestamp_col")

This works in theory, but it does 60 * 24 more computations that necessary, resulting in the query being super slow.

Essentially, I am trying to find a way to make the right window bound skip rows based on a condition. Or, if it is simpler to implement, for every nth row (as I have a constant number of rows for each day).

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
gshpychka
  • 8,523
  • 1
  • 11
  • 31
  • 1
    What is `n`? I'm not 100% clear on what you are trying to do. A more concrete example -- with sample data and desired results -- would help. – Gordon Linoff Dec 03 '20 at 15:37
  • I want to compute some aggregate functions on a rolling window of N calendar days on minute data, but I am only interested in every '24*60'th result of these aggregations (one per day). – gshpychka Dec 03 '20 at 15:48

1 Answers1

1

I don't think that's possible with window functions. You could switch to a subquery instead, assuming that your aggregate function works as a regular aggregate function too (that is, without an OVER() clause):

select 
    timestamp_col,
    (
        select some_aggregate_fun(t1.col)
        from my_table t1
        where 
            t1.user_id = t.user_id
            and t1.timestamp_col >= t.timestamp_col - interval '1' day 
            and t1.timestamp_col <= t.timestamp_col
    )
from my_table t
where timestamp_col = date_trunc('day', timestamp_col)

I am unsure that this would perform better than your original query though; you might need to assess that against your actual dataset.

You can change interval '1' day to the actual interval you want to use.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thanks for the response. I removed my original comment because I wrote it before I actually understood what your query does. I will try this and report back. – gshpychka Dec 03 '20 at 15:45
  • I'm not sure what the etiquette here is, but I don't know if accepting your answer counts as reporting back. In any case, your solution works and helped me a lot. Thanks again. – gshpychka Dec 16 '20 at 12:29