0

we are using Postgres to store ~ 2.000.000.000 samples. This ends up in tables with ~ 500 mio entries and ~100GB Size each table.

What I want to do: E.g. update the table entries: UPDATE table SET flag = true;

After this, the table is twice as big, i.e. 200GB To get the space (stored on a SSD) back we: "VACCUM FULL table"

Unfortunately, this step needs again loads of space which results in the Vacuum to fail due to too little space left.

My Questions: Does this mean, that, in order to make this UPDATE query only once and to get the space back for other tables in this DB we need at least 300-400GB space for a 100GB table?

hb0
  • 3,350
  • 3
  • 30
  • 48
  • 1
    In this specific scenario, if you update all rows and you need to reclaim the space for other tables (rather than for future updates to the same table), there isn't much you can do. The only other alternative would be to update rows by small chunks (a million rows at a time for instance) and `VACUUM` between each chunk. But there's probably something wrong in your design if you need to update all rows of a table. – jcaron Mar 08 '17 at 21:55
  • 1
    If you create the table with a fillfactor smaller then 100% (e.g. 75%) and the column is not indexed, Postgres should be able to do in-place updates ("HOT" update) which would not increase the size of table that much (but the reduced fillfactor will increase the total size as less rows fit on a single block) –  Mar 08 '17 at 22:30
  • If you can afford to lock the table, you can avoid the `VACUUM` if you clone it to a temp table, then `TRUNCATE` and re-`INSERT` with the updated flag. – Nick Barnes Mar 09 '17 at 02:54
  • Thanks so far. This is an operation we just have to do this one time. Is there a postgres-function to do such batch updates & vaccums automatically? The tables are indexed btw. @"Nick Barnes": We can afford this currently. But does this not also need the same space as vacuum (1,5 or 2) * table-size? @all: We can't vaccum in batches, can we? As we already did the update on the whole table. Worst case I have to dump and restore the whole table, but com'on ... there must be a smarter solution ;-) – hb0 Mar 09 '17 at 08:24

1 Answers1

1

In your scenario, you won't get away without having at least twice as much space as the table data would require.

The cheapest solution is probably to define the table with a fillfactor of 50 so that half of each block is left empty, thereby doubling the table size. Then the updated rows can all be in the same block as the original rows, and the UPDATE won't increase the table size because PostgreSQL can use the heap only tuple (HOT) update feature. The old versions will be freed immediately if there are no long running transactions that can still see them.

NOTE: This will only work if the colum you are updating is not indexed.

The downside of this approach is that the table is always twice the necessary size, and all sequential scans will take twice as long. It won't bother you if you don't use sequential scans of the table.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Well: (1) we make continuously sequential scans on the table (2) we need our indexes in order to process in a reasonable time (Thus) This HOT-based solution is no solution for us (3) also because the tables will grow much bigger within the next year (probably 100 x the size) and we can't afford to have twice as much space required for each of the tables .... I guess we have to go with the batch-based updates and vaccuum (and write a routine for this - we anyway should only hardly ever do such whole-table-updates). Thanks alot for your reply! – hb0 Mar 09 '17 at 09:52