Question
I have the following table with PostgreSQL+TimescaleDB:
create table heatingElectricityMeter(
time timestamptz(0) not null,
import real not null
);
select create_hypertable('heatingElectricityMeter', 'time', chunk_time_interval => interval '1 month');
"import" is the total energy consumed since the energy meter was produced. Thus "import" is monotonically increasing. About once every 10seconds I insert a value to that table (however currently I have much less then one value for every 10s in the DB).
Now I would like to create a bar-chart that shows energy consumed per local day for the last X days. Let's say... The last 90 days.
What I have tried so far
Here is my first attempt:
select
time_bucket(interval '1 day', time::timestamp) "day",
first(import, "time"),
last(import, "time")
from heatingelectricitymeter
where time between '2021-07-28T02:00:00Z' and '2021-10-28T02:00:00Z'
group by "day"
order by 1
This query takes roughly 2.5seconds on my Raspberry Pie 2 with ~85K data points in the queried time frame. Not really a "problem" but... It could be faster. My next attempt:
select
time_bucket(interval '1 day', time::timestamp) "day",
max(import) - min(import)
from heatingelectricitymeter
where time between '2021-07-28T02:00:00Z' and '2021-10-28T02:00:00Z'
group by "day"
order by 1
The effect should be the same and I kind of expected this to perform the same. However to my surprise this is quite a lot faster: It just takes ~1.5 seconds.
(Side note: Not casting to ::timestamp
speeds up the queries to 1.7seconds and 1.0seconds respectively. However I eventually want to use midnight-midnight in local time and if I understand the manual correctly that means that I will need to do this cast)
Now one thing is that the docs say:
The last and first commands do not use indexes, and instead perform a sequential scan through their groups.
This does not really answer why min
and max
are faster. They can't use an index either. At the same time using and index seems to make sense. We just need the first and last value for each group.
My next attempt is to gather the required values directly and not use grouping:
select "day", "import" - lag("import") over (order by "day") from (
select ('2021-10-28'::date - v) "day"
from generate_series(0, 90) s(v)
)x
join lateral (
select time, import from heatingElectricityMeter where time >= day order by time limit 1
)y on true
BINGO! Down to 0.7 seconds!
The result is not exactly the same as min/max/first/last, as in this query I will always compare the first value of each day with the first value of the previous day. In my opinion this is actually better though, because data might land on the wrong day, but it won't be lost. (In a perfect world I'd get the interpolated values at exactly midnight for each day, however I am putting that off for simplicity. For my use case I can life with the slight inaccuracy).
However it feels like I am doing it in a way to complicated fashion. To me it seems like something which is perfectly normal task to do with a time database. Is there a simpler way to efficiently do this query?