We've noticed one of our tables growing considerably on PG 12. This table is the target of very frequent updates, with a mix of column types, including a very large text
column (often with over 50kb of data) - we run a local cron job that looks for rows older than X time and set the text
column to a null value (as we no longer need the data for that particular column after X amount of time).
We understand this does not actually free up disk space due to the MVCC model, but we were hoping that auto-vacuum would take care of this. To our surprise, the table continues to grow (now over 40gb worth) without auto-vacuum running. Running a vacuum manually has addressed the issue and we no longer see growth.
This has lead me to investigate other tables, I'm realising that I don't understand how auto-vacuum is triggered at all.
Here is my understanding of how it works, which hopefully someone can pick apart:
- I look for tables that have a large amount of dead tuples in them:
select * from pg_stat_all_tables ORDER BY n_dead_tup desc;
- I identify
tableX
with 33169557 dead tuples (n_dead_tup column). - I run a
select * from pg_class ORDER BY reltuples desc;
to check how many estimated rows there are on tabletableX
- I identify 1725253 rows via the
reltuples
column. - I confirm my autovacuum settings:
autovacuum_vacuum_threshold = 50
andautovacuum_vacuum_scale_factor = 0.2
- I apply the formula
threshold + pg_class.reltuples * scale_factor
, so,50 + 1725253 * 0.2
which returns 345100.6
It is my understanding that auto-vacuum will start on this table once ~345100 dead tuples are found. But tableX
is already at a whopping 33169557 dead tuples!, The last_autovacuum on this table was back in February.
Any clarification would be welcome.