What is the TimescaleDB way to manually consolidate aggregated data?

Question

I have a rather common time-series set up:

A TimescaleDB Hypertable (data) tracking some metrics over time. The columns are: timestamp, identifier, metric.
The Hypertable has a continuous aggregation table (aggregated_data) that aggregates the metric on an hourly basis by storing several statistical values (avg, min, max).
Only the aggregated metrics will be shown to the user

I would like to introduce the possibility to manually consolidate the metrics (e.g. edit outliers, fill gaps, ...). These consolidated values will then be shown to the user together with the measured metrics.

What is the TimescaleDB way to achieve this?

Here's what I came up with:

Create a regular table (consolidated_data) for the manually consolidated data
This table will have the same columns as the aggregated_data view
I would then UNION these two tables and add some SQL to avoid duplicates, identify the consolidated data, ...

The UNION operation would then lokk similar to this (I did not actually run/test this statement):

SELECT DISTINCT timestamp, identifier FROM (
   SELECT timestamp, identifier, metric, true as consolidated FROM consolidated_data WHERE timestamp > NOW() - interval '7 days'
   UNION ALL
   SELECT timestamp, identifier, metric, false as consolidated FROM aggregated_data WHERE timestamp > NOW() - interval '7 days'
)
ORDER BY timestamp;

I'd expect this approach to work, however there's still a lot of unknowns:

Should the consolidated table be a Hypertable? Would this speed up the UNION operation?
Should I prefer MERGE?
Is there a smarter way with tools and utilities I do not yet know of?

another idea is to directly update the data in the hypertable with the edited values - and maybe copy the originals to another table (e.g. to undo the edits) — TmTron, Jul 10 '19 at 05:50
@TmTron Thanks. This would mean that the user must edit the values on the 'raw' data (potentially a lot of metrics). I'd prefer to let the user edit the aggregated values. — JuiceOntour, Jul 10 '19 at 10:33

score 1 · Answer 1 · answered Jul 09 '19 at 16:54

1

There are certainly people who manually create continuous aggregates, and you can do something like a CRON job to achieve that.

That being said, it seems like things might be easier if you leverage the DB as much as possible. You could use our gapfilling / LOCF functions to fill in the last value during query time.

Our next release will also allow you to have multiple continuous aggregates on a hypertable, as well as make JOINs involving hypertables more performant.

If your consolidated table isn't too big, it seems like it might be okay for it to not be a hypertable.

answered Jul 09 '19 at 16:54

Diana Hsieh

86
2

1

Thanks for the reply. I probably must reformulate a bit the question to make things clearer: I am **not** looking to aggregate "manually"; the tools provided by TimescaleDB accomplish this task wonderfully. I want to give to users that visualize the aggregated data the ability to overwrite them manually (e.g. replace an outlier with a more plausible value). These overwrites would then be stored in the *consolidated_data* table. – JuiceOntour Jul 09 '19 at 18:41
1

Ok, I looked into this further for you. The only way would be to find a way to do something like - delete all the values that go into that agg and then store the new agg. I wouldn't touch the continuous agg yourself, cause then you might end up breaking things. So unfortunately no great workaround for this. Since we do support invalidation, perhaps you could do something like - select all rows that impact a given agg, then insert a new measurement for that time range that updates the agg. Seems dangerous to me though. – Diana Hsieh Jul 10 '19 at 22:28

What is the TimescaleDB way to manually consolidate aggregated data?

Here's what I came up with:

1 Answers1