Context: I have a large table with logon times. I want to calculate a rolling count of logons within a specified period (e.g. 3600 sec).
In SQL/HQL i would specify this as:
SELECT id, logon_time, COUNT(*) OVER(
PARTITION BY id ORDER BY logon_time
RANGE BETWEEN 3600 PRECEDING AND CURRENT ROW) as window_count
FROM df
As a workaround in sparklyr, we could write this as:
df %>%
mutate(window_count = sql('COUNT(*) OVER(
PARTITION BY id ORDER BY logon_time
RANGE BETWEEN 3600 PRECEDING AND CURRENT ROW'))
I want to know if there's an existing sparklyr/dplyr way of achieving this. This issue relating to adding the "window_frame" function to dbplyr looks promising, however i can't find a way to specify "range between", the below operation only generates a "rows between" window:
df %>%
group_by(id) %>%
window_frame(-3600, 0) %>%
window_order(logon_time) %>%
mutate(windowed_count = n())