Timescale hypertable columns vs multiple tables/normalized schema

Question

I am using Timescale DB to store measurements from many devices. Since most devices so far had a similar set of metrics, I used a hypertable with many columns (one for each metric).

However, some new devices will have to support different metrics, and there are also derived values which I need to calculate and store for each device.

So it seems I have to store these values as separate tables, or use the normalized approach (timestamp, device, metric_type, value) if I want to have a general way for saving time-series data?

Is the "many columns" schema approach wrong for these kinds of measurements, or am I doing it wrong?

score 5 · Answer 1 · answered May 07 '19 at 20:18

5

There is not one-size-fits all answer here. A lot depends on your query patterns and analysis. But, a good initial rule of thumb is one hypertable per device type.

That way devices that store similar measure are stored together in a table but you have different tables for different device types. If one device type is just an upgrade of a previous type and has a superset of fields, I'd recommend using the same table with new NULLable columns. NULLS are extremely cheap in Postgres as is adding new NULLable columns.

I'd stay away from the EAV/normalized approach (timestamp, device, metric_type, value) unless you want users adding custom fields.

answered May 07 '19 at 20:18

cevian

121
1

Thanks! The scenario is that we have customers which want to have specific pages with some custom values (derived/calculated from existing metrics), but since there is lots of data we want to use continuous aggregates (or some way of having precalculated aggregates). So since I haven't used "big table" databases before, I am not sure if adding lots of new tables is considered bad practice, or is it the common way of extending functionality. – Lou May 07 '19 at 21:20
+1 for continuous aggregates (which PipelineDB had been very useful for). IF your customers are in the need for near-time KPI's, pre-calculated aggregates are the way to go. If they need interactive drill-down/filter/... capabilities, then you need to eliminate as many JOIN's as you can and also think about [tablesample](https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation) for quick (estimated) results as well as deep dive into faster aggregates such as HLL aso. Too bad that Citus' cstore is not compatible with TimescaleDB. – Ancoron May 08 '19 at 20:02
Please note that continuous aggregated have been added. Read more here https://blog.timescale.com/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/ – cevian Jun 08 '19 at 15:23

Timescale hypertable columns vs multiple tables/normalized schema

1 Answers1