I am running a Python script which processes time-series data for a number of different metrics, and then writes the results to a Postgres database.
The time-series assumes 40 epochs, stored as a real[40]
array column in the database.
When writing the outputs for all 40 epochs to the table in one shot, (batch update across all rows), everything seemed to work fine. i.e.
UPDATE my_table SET
arr_col_1 = {1, 2, 3, ... 40},
arr_col_2 = {1, 2, 3, ...40},
...
arr_col_90 = {1, 2, 3, ...40};
However, iteratively writing the results of the respective epochs to each position in the array seems to chew up all free space on the hard drive, e.g.
UPDATE my_table SET
arr_col_1[1] = 1,
arr_col_2[1] = 1,
...
arr_col_90[1] = 1;
UPDATE my_table SET
arr_col_1[2] = 2,
arr_col_2[2] = 2,
...
arr_col_90[2] = 2;
-- repeat x 38 more times
The reason for the iterative strategy is to accommodate larger quantities of rows, for which the results for 40 epochs don't fit into memory at the same time.
To my knowledge, UPDATE
queries will delete and rewrite row data in certain situations, but I'm not clear on when this happens and how this possibly relates to arrays. Is there a way to iteratively update arrays across large numbers of rows without leading to database bloat?