How to deal with a stuck PostgreSQL 9.3 VACUUM ANALYZE?

Question

We run PostgreSQL 9.3 on the AWS RDS platform. Every night at 1am we've been running a global VACUUM ANALYZE job.

Yesterday we observed severe degradation in performance and as it turned out we had 5 VACUUM ANALYZE processes stuck for the past 5 days. Over the same period of time the disk utilization went up by 45 gigabytes.

I killed it with pg_terminate_backend but that didn't have much impact. The processes looked dead but performance was still severely degraded. Since we are using AWS RDS, we've performed a reboot with failover and performance drastically improved right away.

This morning I checked and found that VACUUM ANALYZE was stuck again for 5 hours. I killed it, but I suspect it is still there somewhere.

Upon further investigation I confirmed that auto_vacuum is correctly enabled, which means we do not need to run manual VACUUM but we may need to run ANALYZE on some or all of the tables.

In my research I found this article: http://rhaas.blogspot.com/2011/03/troubleshooting-stuck-vacuums.html and http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT .

In the end, I have the following questions:

Is it correct to not run manual VACUUM with auto_vacuum enabled ?
How can I monitor the progress and performance of auto_vacuum ? How do I know it is not stuck in the same place as the manual VACUUM ?
Do I still need run ANALYZE on a regular basis ?
Is there a way to enable automatic ANALYZE, similar to auto_vacuum ?

Thank you... the process doesn't run out of cron, but that's a good point. — Oleg Dulin, Sep 02 '15 at 14:59
You should contact AWS support to learn more about why it's getting stuck. — Craig Ringer, Sep 03 '15 at 00:08

score 14 · Answer 1 · answered Nov 07 '16 at 03:53

Postgres 9.6+ has two life-savers for this use-case:

A new system view (pg_stat_progress_vacuum) that allows you see Vacuum Progress
VACUUM is now able to do part Vacuums, which is very good for large tables. Vacuums often get stuck because of Concurrent (user) Queries and eventually slow-down or get blocked and timeout. In Postgres 9.6+, Vacuum can for e.g. Vacuum 80% of a large table, and do the remaining 20% of the table in a subsequent run. This was impossible in previous releases.

score 5 · Accepted Answer · answered Sep 03 '15 at 00:12

Is it correct to not run manual VACUUM with auto_vacuum enabled ?

You generally do not need manual vacuum of any kind. If autovacuum is not keeping up, make it run more often and faster. See the autovacuum documentation.

How can I monitor the progress and performance of auto_vacuum ?

Watch for growth of table bloat. There is, unfortunately, no pg_stat_autovacuum or similar. You can see autovacuum working in pg_stat_activity but only instant-to-instant. Detailed analysis requires trawling through log files with autovacuum logging enabled.

How do I know it is not stuck in the same place as the manual VACUUM ?

Check pg_stat_activity. You don't know it's in the same place, and you can't even really tell if it's progressing or not, but you can see if it's running or not.

Lots of improvement could be made to admin/monitoring of vacuum, as you can see. We lack people who have the time, willingness and knowledge required to do it, though. Everyone wants to add new shiny features instead.

Do I still need run ANALYZE on a regular basis ?

No.

Is there a way to enable automatic ANALYZE, similar to auto_vacuum ?

Autovacuum runs analyze (or rather VACUUM ANALYZE) when required.

How to deal with a stuck PostgreSQL 9.3 VACUUM ANALYZE?

2 Answers2