4

Question

I have the following table with PostgreSQL+TimescaleDB:

create table heatingElectricityMeter(
    time timestamptz(0) not null,
    import real not null
);

select create_hypertable('heatingElectricityMeter', 'time', chunk_time_interval => interval '1 month');

"import" is the total energy consumed since the energy meter was produced. Thus "import" is monotonically increasing. About once every 10seconds I insert a value to that table (however currently I have much less then one value for every 10s in the DB).

Now I would like to create a bar-chart that shows energy consumed per local day for the last X days. Let's say... The last 90 days.

What I have tried so far

Here is my first attempt:

select
    time_bucket(interval '1 day', time::timestamp) "day",
    first(import, "time"),
    last(import, "time")
from heatingelectricitymeter
where time between '2021-07-28T02:00:00Z' and '2021-10-28T02:00:00Z'
group by "day"
order by 1

This query takes roughly 2.5seconds on my Raspberry Pie 2 with ~85K data points in the queried time frame. Not really a "problem" but... It could be faster. My next attempt:

select
    time_bucket(interval '1 day', time::timestamp) "day",
    max(import) - min(import)
from heatingelectricitymeter
where time between '2021-07-28T02:00:00Z' and '2021-10-28T02:00:00Z'
group by "day"
order by 1

The effect should be the same and I kind of expected this to perform the same. However to my surprise this is quite a lot faster: It just takes ~1.5 seconds.

(Side note: Not casting to ::timestamp speeds up the queries to 1.7seconds and 1.0seconds respectively. However I eventually want to use midnight-midnight in local time and if I understand the manual correctly that means that I will need to do this cast)

Now one thing is that the docs say:

The last and first commands do not use indexes, and instead perform a sequential scan through their groups.

This does not really answer why min and max are faster. They can't use an index either. At the same time using and index seems to make sense. We just need the first and last value for each group.

My next attempt is to gather the required values directly and not use grouping:

select "day", "import" - lag("import") over (order by "day") from (
    select ('2021-10-28'::date - v) "day"
    from generate_series(0, 90) s(v)
)x
join lateral (
    select time, import from heatingElectricityMeter where time >= day order by time limit 1
)y on true

BINGO! Down to 0.7 seconds!

The result is not exactly the same as min/max/first/last, as in this query I will always compare the first value of each day with the first value of the previous day. In my opinion this is actually better though, because data might land on the wrong day, but it won't be lost. (In a perfect world I'd get the interpolated values at exactly midnight for each day, however I am putting that off for simplicity. For my use case I can life with the slight inaccuracy).

However it feels like I am doing it in a way to complicated fashion. To me it seems like something which is perfectly normal task to do with a time database. Is there a simpler way to efficiently do this query?

yankee
  • 38,872
  • 15
  • 103
  • 162
  • Why do you think the effect of the first 2 queries should be the same? i.e. `first`/`last` and `min`/`max` are doing different things. – TmTron Nov 01 '21 at 08:44
  • 1
    @TmTron: In the beginning of the question I explained `"import" is monotonically increasing`. This is guaranteed as the counter is physically incapable of counting backwards. Thus the `first` value must always also be the `min` value and the same applies for `last` and `max`. – yankee Nov 02 '21 at 12:12
  • Hey folks, we actually have updated some things and there's some much better stuff for this now, I may modify this and move the answer over here at some point, but I did an answer over in our forum that might be useful here: https://www.timescale.com/forum/t/dear-eon-measuring-deltas-correctly-for-energy-meters/1266 – davidk Feb 09 '23 at 15:44

1 Answers1

1

Sometimes there is still some work you have to do to optimize this, the lateral query will use indexes and will only have to do index scans on one side, and then the lag will get the previous value looked up by the index scan.

Note that you are getting slightly different values with the last query, instead of getting the min and max in a given day you're subtracting the last value of the previous day from the last value of the current, which is a slightly different thing. As long as you are okay with that, you're fine.

Now - min and max can, in fact, use indexes, they get optimized differently and can be made to use index scans depending on what indexes you have, so that might be why that's faster.

Not sure if that's a good enough answer or explains enough, but it is what I think is going on here.

Note: we added some functionality in TimescaleDB around this, I'd recommend reading this post in our forum that addresses this problem in more detail if you're interested: https://www.timescale.com/forum/t/dear-eon-measuring-deltas-correctly-for-energy-meters/1266

davidk
  • 1,003
  • 5
  • 9
  • Thank you for answering. Yes I am aware that the last query will have slightly different results. I edited my question to mention this explicitly. I have not created any index. Timescale automatically creates an index for "time" but nothing else (so I understand). So I don't think min/max can use any index here. I think I understand what is going on, but actually I am asking for a simpler and faster solution. – yankee Nov 02 '21 at 19:07
  • A continuous aggregate would speed things up most likely. If that helps? https://docs.timescale.com/timescaledb/latest/how-to-guides/continuous-aggregates/about-continuous-aggregates/#about-continuous-aggregates – davidk Nov 03 '21 at 13:08