5

I'm facing some performance issues with TimescaleDB continuous aggregates, and I don't fully understand why it happens.

I have a table for my timeseries data:

CREATE TABLE IF NOT EXISTS ack_alarm_number
(
    time      TIMESTAMP NOT NULL,
    entity_id INT,
    user_id   VARCHAR(255),
    value     INT
);
SELECT create_hypertable('ack_alarm_number', 'time', if_not_exists => TRUE);

I want to get a daily average value from my table by entity_id or/and user_id, also I expect that my table will have a lot of rows and aggregate queries might be slow, so I decided to create a continuous aggregate:

CREATE MATERIALIZED VIEW IF NOT EXISTS ack_alarm_number_daily
            WITH (timescaledb.continuous) AS
SELECT  time_bucket(INTERVAL '1 day', time) AS bucket,
        entity_id, user_id,
        AVG(value)
FROM ack_alarm_number
GROUP BY entity_id, user_id, bucket;

After that I've generated ~7 million rows, time interval is around ~3 months.

It works fine, when I execute queries like this:

select bucket, avg(m.avg) 
from ack_alarm_number_daily m 
where m.entity_id = 2 
group by bucket

But when I try to execute the following query in order to get average number by all entities, it becomes very slow ~15 seconds:

select bucket, avg(m.avg) 
from ack_alarm_number_daily m 
group by bucket

When I execute the same type of query from ack_alarm_number table, it's not so slow ~400ms:

select time_bucket('1 day', m.time) as bucket, avg(m.value) 
from ack_alarm_number m 
group by bucket

Also there is a difference when I try to join another table.

When I execute the following query, it's ~17s:

select bucket, avg(m.avg) 
from ack_alarm_number_daily m 
join entities e on m.entity_id = e.id 
where e.category='category-2' 
group by bucket

The same type of query from ack_alarm_number is ~400ms:

select time_bucket('1 day', m.time) as bucket, avg(m.value) 
from ack_alarm_number m 
join entities e on m.entity_id = e.id 
where e.category='category-2' 
group by bucket

There is one detail that I've found:
There is no significant difference in rows count between table and materialized view, because I have 1000000 different entity_ids and 10 user_ids and they were uniform distributed across the timeline during data generation, so 7 millions rows is not enough to get a real advantage of continuous aggregate in my case, but it's still interesting, even if they have similar rows number, why is there a difference in the query performance?

Oldook
  • 107
  • 5
  • You need to use the time column to get advantage of parallel scan over multiple chunks. If you need to go through all the data it will not perform well. Can you try to limit to some date range? – jonatasdp Nov 18 '21 at 17:53
  • @jonatasdp I tried to limit time to a month and yes, of course, if I limit it by some time interval it's will be faster, but there is still a big difference in the performance between simple hypertable(~300ms) and continuous aggregate(~7 sec). – Oldook Nov 18 '21 at 18:26
  • Can you share the EXPLAIN of the query here? My guess is that you'll need to create a CTE for the other query to be processed once. – jonatasdp Nov 30 '21 at 11:08
  • I observe same issue on latest docker image with weather_metrics demo. Chunk size is 7 days querying 1 chunk requires 12 seconds on ! `select * from weather_metrics_daily where bucket between (now() - interval '4 year 1 day') and (now() - interval '4 year') and city_name = 'Lisbon' ;` – Daniil Iaitskov May 03 '22 at 22:35

0 Answers0