3

Summary

I am stuck on a problem where the row limit on a Materialized View is exceeding the maximum allowed in a Postgres Database.

Description

The table named PAC4200_Metering has 108 fields that are all double precision fields. I would like to store a materialized cache of the tables where I store a JSON object with keys for average, max, min, etc of the field during the time bucket.

SELECT
  "deviceId",
  time_bucket('1 hours', "time") as starttime,
  json_build_object(
   'average', avg("voltage_an"),
   'maxvalue', max("voltage_an"),
   'minvalue', min("voltage_an"),
   'sum', sum("voltage_an"),
   'firstvalue', first("voltage_an", "time"),
   'firsttime', min("time" AT TIME ZONE 'UTC'),
   'lasttime', max("time" AT TIME ZONE 'UTC'),
   'lastvalue', last("voltage_an", "time"),
   'sd', stddev_pop("voltage_an") ,
   'countgood', COUNT(*),
   'countbad', 0,
   'countuncertain', 0
  ) AS "voltage_an"
  ...
FROM
  "PAC4200_Metering"
GROUP BY
  "deviceId",
  starttime

Error Response:

INFO:  new materialization range for public.PAC4200_Metering larger than allowed in one run, truncating (time column time) (1568760300000000)
INFO:  new materialization range for public.PAC4200_Metering (time column time) (1568708100000000)
INFO:  materializing continuous aggregate public.PAC4200_Metering_15_minute: new range up to 1568708100000000

ERROR:  row is too big: size 12456, maximum size 8160
CONTEXT:  SQL statement "INSERT INTO _timescaledb_internal._materialized_hypertable_1108 SELECT * FROM _timescaledb_internal._partial_view_1108 AS I WHERE I.starttime >= '-infinity' AND I.starttime < '2019-09-17 08:15:00+00';"
SQL state: 54000

Alternative Methods I have Tried

Instead of populating as JSON objects, I have tried storing each key in the JSON object (average, min, max, etc...) as fields that look like this "voltage_an_avg", "voltage_an_min", "voltage_an_max" and then accessing them in this way, but I am still running into the same limitation.

Main Question / Solutions

  • Is there a way to increase this row size limit? (if that is even good practice)
  • Is there a better way to store these values...
  • What is the schema of the table `PAC4200_Metering`? It might be useful to see the relevant part of it. How do you create the continuous aggregate? Is the error reported related to the query or continuous aggregate? – k_rus Sep 18 '19 at 06:22
  • Since your data are double precision fields, it is not recommended to use JSON due to the danger of losing numeric precision: https://www.postgresql.org/docs/9.4/datatype-json.html – k_rus Sep 18 '19 at 08:03
  • What is the reason to use JSON? – k_rus Sep 18 '19 at 08:31
  • You don't specify explicitly, but I guess you use continuous aggregates, right? – k_rus Sep 19 '19 at 05:55

1 Answers1

1

You cannot change the row limit as each row must fit a page, which is 8K.

Since the values are double precision fields, putting them into JSON might lose numeric precision. This is a limitation of JSON format in general. Thus you need to drop usage of JSON in the continuous aggregate or reconsider types, so they fit NUMERIC field of JSON (single precision).

Since the row is limited to 8160 bytes, around 1000 fields can fit. This can be enough to fit all your columns if you avoid materialising aggregates, which can be calculated later. For example, average can be calculated from sum and countgood. While countbad and countuncertain contains no information in your example.

If the OP query is used for creating continuous aggregate, it can be rewritten into:

SELECT
  "deviceId",
  time_bucket('1 hours', "time") as starttime,
  max("voltage_an") as maxvalue,
  min("voltage_an") as minvalue,
  sum("voltage_an") as sum,
  first("voltage_an", "time") as firstvalue,
  min("time" AT TIME ZONE 'UTC') as firsttime,
  max("time" AT TIME ZONE 'UTC') as lasttime,
  last("voltage_an", "time") as lastvalue,
  stddev_pop("voltage_an") as sd,
  COUNT(*) countgood,
  ...
FROM
  "PAC4200_Metering"
GROUP BY
  "deviceId",
  starttime

If voltage_an is one of 108 different values and 8 aggregates are calculated above, it will be 108*8*8 + 3*8 = 6912 + 24 = 6936 bytes max.

Then you can get the same result as original query from the continuous aggregate:

SELECT
  "deviceId",
  starttime,
  json_build_object(
   'average', "sum"/"countgood",
   'maxvalue', "maxvalue",
   'minvalue', "minvalue",
   'sum', "sum",
   'firstvalue', "firstvalue",
   'firsttime', "firsttime",
   'lasttime', "lasttime",
   'lastvalue', "lastvalue",
   'sd', "sd",
   'countgood', "countgood",
   'countbad', 0,
   'countuncertain', 0
  ) AS "voltage_an"
  ...
FROM
  CAGG

It is also possible to define several continuous aggregates and then join them.

I would recommend to think carefully how much information needs to be materialised as nothing comes for free. E.g., it occupies space. Furthermore, having each row occupying entire 8K page can affect efficiency of PostgreSQL further.

k_rus
  • 2,959
  • 1
  • 19
  • 31
  • Thank you for the reply. Storing as JSON was an after thought, since I was running into the same performance problem creating the Aggregate Materialized View. Your query suggestion was one that I tried before, but my limitation I will still run into when storing the data in a 6936 Byte format, is that this is just one of many tables. Now that I am thinking about it further, it may make sense to store the aggregated tables grouped by time_bucket, deviceId, and the field average, max, min, etc. There are many such Materialized Views that need to be generated and some will have more fields. – Alexander Buczynsky Sep 18 '19 at 15:02
  • @AlexanderBuczynsky have you thought if you can reduce precision of aggregation results? You say you run into performance problem of creating materialised views, however OP is about getting error, can you clarify? – k_rus Sep 19 '19 at 05:54