I'm facing some performance issues with TimescaleDB continuous aggregates, and I don't fully understand why it happens.
I have a table for my timeseries data:
CREATE TABLE IF NOT EXISTS ack_alarm_number
(
time TIMESTAMP NOT NULL,
entity_id INT,
user_id VARCHAR(255),
value INT
);
SELECT create_hypertable('ack_alarm_number', 'time', if_not_exists => TRUE);
I want to get a daily average value from my table by entity_id or/and user_id, also I expect that my table will have a lot of rows and aggregate queries might be slow, so I decided to create a continuous aggregate:
CREATE MATERIALIZED VIEW IF NOT EXISTS ack_alarm_number_daily
WITH (timescaledb.continuous) AS
SELECT time_bucket(INTERVAL '1 day', time) AS bucket,
entity_id, user_id,
AVG(value)
FROM ack_alarm_number
GROUP BY entity_id, user_id, bucket;
After that I've generated ~7 million rows, time interval is around ~3 months.
It works fine, when I execute queries like this:
select bucket, avg(m.avg)
from ack_alarm_number_daily m
where m.entity_id = 2
group by bucket
But when I try to execute the following query in order to get average number by all entities, it becomes very slow ~15 seconds:
select bucket, avg(m.avg)
from ack_alarm_number_daily m
group by bucket
When I execute the same type of query from ack_alarm_number
table, it's not so slow ~400ms:
select time_bucket('1 day', m.time) as bucket, avg(m.value)
from ack_alarm_number m
group by bucket
Also there is a difference when I try to join another table.
When I execute the following query, it's ~17s:
select bucket, avg(m.avg)
from ack_alarm_number_daily m
join entities e on m.entity_id = e.id
where e.category='category-2'
group by bucket
The same type of query from ack_alarm_number
is ~400ms:
select time_bucket('1 day', m.time) as bucket, avg(m.value)
from ack_alarm_number m
join entities e on m.entity_id = e.id
where e.category='category-2'
group by bucket
There is one detail that I've found:
There is no significant difference in rows count between table and materialized view, because I have 1000000 different entity_ids and 10 user_ids and they were uniform distributed across the timeline during data generation, so 7 millions rows is not enough to get a real advantage of continuous aggregate in my case, but it's still interesting, even if they have similar rows number, why is there a difference in the query performance?