Cumulative count for calculating daily frequency using SQL query (in Amazon Redshift)

Question

I have a dataset contains 'UI' (unique id), time, frequency (frequency for give value in UI column), as it is shown here:

What I would like to add a new column named 'daily_frequency' which simply counts each unique value in UI column for a given day sequentially as I show in the image below.

For example if UI=114737 and it is repeated 2 times in one day, we should have 1, and 2 in the daily_frequency column.

I could do that with Python and Panda package using group by and cumcount methods as follows ...

df['daily_frequency'] = df.groupby(['UI','day']).cumcount()+1

However, for some reason, I must do this via SQL queries (Amazon Redshift).

Salman A · Accepted Answer · 2019-07-05T12:53:48.540

2

I think you want a running count, which could be calculated as:

COUNT(*) OVER (PARTITION BY ui, TRUNC(time) ORDER BY time
               ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS daily_frequency

edited Jul 05 '19 at 12:53

answered Jul 05 '19 at 12:43

Salman A

262,204
82
430
521

Thanks Salman for quick reply. I have tried what you suggested, however, I got the following error message `[Amazon](500310) Invalid operation: Aggregate window functions with an ORDER BY clause require a frame clause;` ... – Amin Kiany Jul 05 '19 at 12:49
@AminKiany . . . That is something of a peculiarity of Redshift. – Gordon Linoff Jul 05 '19 at 12:50
@SalmanA: Thanks Salman for the answer. I also edited my post. – Amin Kiany Jul 05 '19 at 13:01

score 0 · Answer 2 · answered Jul 05 '19 at 12:51

0

Although Salman's answer seems to be correct, I think ROW_NUMBER() is simpler:

COUNT(*) OVER (PARTITION BY ui, time::date
               ORDER BY time
              ) AS daily_frequency

answered Jul 05 '19 at 12:51

Gordon Linoff

1,242,037
58
646
786

Cumulative count for calculating daily frequency using SQL query (in Amazon Redshift)

2 Answers2