Firstly what you show as "average" in your example is the "sum", and you first "Shanghia" result is including a "Beijing" result.
You have two options, build a fixed sized window dataset (build partials for each minute) and then use window frame of fixed size over that, OR self-join and just aggregate those (as Felipe has shown).
If you have very dense data, you might find the former more performant, and if you have sparse data, the later approach should be faster, and is definitely faster to code.
So the simple first:
with data(city, timestamp, value) as (
select column1, try_to_timestamp(column2, 'yyyy/mm/dd hh:mi'), column3 from values
('beijing', '2022/05/25 10:33', 22),
('beijing', '2022/05/25 10:37', 20),
('beijing', '2022/05/25 11:36', 29),
('beijing', '2022/05/26 11:36', 28),
('beijing', '2022/05/26 10:00', 21),
('shanghai', '2022/05/26 11:00', 33),
('shanghai', '2022/05/26 11:46', 35),
('shanghai', '2022/05/26 12:40', 37)
)
select a.*
,avg(b.value) as p60_avg
,count(b.value)-1 as p60_count
,sum(b.value) as p60_sum
from data as a
left join data as b
on a.city = b.city and b.timestamp between dateadd(hour, -1, a.timestamp) and a.timestamp
group by 1,2,3
order by 1,2
gives:
CITY |
TIMESTAMP |
VALUE |
P60_AVG |
P60_COUNT |
P60_SUM |
beijing |
2022-05-25 10:33:00.000 |
22 |
22 |
0 |
22 |
beijing |
2022-05-25 10:37:00.000 |
20 |
21 |
1 |
42 |
beijing |
2022-05-25 11:36:00.000 |
29 |
24.5 |
1 |
49 |
beijing |
2022-05-26 10:00:00.000 |
21 |
21 |
0 |
21 |
beijing |
2022-05-26 11:36:00.000 |
28 |
28 |
0 |
28 |
shanghai |
2022-05-26 11:00:00.000 |
33 |
33 |
0 |
33 |
shanghai |
2022-05-26 11:46:00.000 |
35 |
34 |
1 |
68 |
shanghai |
2022-05-26 12:40:00.000 |
37 |
36 |
1 |
72 |
The dense version:
with data(city, timestamp, value) as (
select column1, try_to_timestamp(column2, 'yyyy/mm/dd hh:mi'), column3 from values
('beijing', '2022/05/25 10:33', 22),
('beijing', '2022/05/25 10:37', 20),
('beijing', '2022/05/25 11:36', 29),
('beijing', '2022/05/26 11:36', 28),
('beijing', '2022/05/26 10:00', 21),
('shanghai', '2022/05/26 11:00', 33),
('shanghai', '2022/05/26 11:46', 35),
('shanghai', '2022/05/26 12:40', 37)
), filled_time as (
select city,
dateadd(minute, row_number() over(partition by city order by null)-1, min_t) as timestamp
from (
select
city, min(timestamp) as min_t, max(timestamp) as max_t
from data
group by 1
), table(generator(ROWCOUNT => 10000))
qualify timestamp <= max_t
)
select
ft.city
,ft.timestamp
,avg(d.value) over (order by ft.timestamp ROWS BETWEEN 60 PRECEDING AND current row ) as p60_avg
from filled_time as ft
left join data as d
on ft.city = d.city and ft.timestamp = d.timestamp
order by 1,2;
gives:
CITY |
TIMESTAMP |
P60_AVG |
beijing |
2022-05-25 10:33:00.000 |
22 |
beijing |
2022-05-25 10:34:00.000 |
22 |
beijing |
2022-05-25 10:35:00.000 |
22 |
beijing |
2022-05-25 10:36:00.000 |
22 |
beijing |
2022-05-25 10:37:00.000 |
21 |
beijing |
2022-05-25 10:38:00.000 |
21 |
beijing |
2022-05-25 10:39:00.000 |
21 |
beijing |
2022-05-25 10:40:00.000 |
21 |
beijing |
2022-05-25 10:41:00.000 |
21 |
beijing |
2022-05-25 10:42:00.000 |
21 |
beijing |
2022-05-25 10:43:00.000 |
21 |
beijing |
2022-05-25 10:44:00.000 |
21 |
beijing |
2022-05-25 10:45:00.000 |
21 |
beijing |
2022-05-25 10:46:00.000 |
21 |
snip... |
|
|
And those "extra" rows could be dumped with a qualify
select
ft.city
,ft.timestamp
,avg(d.value) over (order by ft.timestamp ROWS BETWEEN 60 PRECEDING AND current row ) as p60_avg
--,count(b.value)-1 as p60_count
--,sum(b.value) as p60_sum
from filled_time as ft
left join data as d
on ft.city = d.city and ft.timestamp = d.timestamp
qualify d.value is not null
order by 1,2;