We run PostgreSQL 9.3 on the AWS RDS platform. Every night at 1am we've been running a global VACUUM ANALYZE
job.
Yesterday we observed severe degradation in performance and as it turned out we had 5 VACUUM ANALYZE
processes stuck for the past 5 days. Over the same period of time the disk utilization went up by 45 gigabytes.
I killed it with pg_terminate_backend
but that didn't have much impact. The processes looked dead but performance was still severely degraded. Since we are using AWS RDS, we've performed a reboot with failover and performance drastically improved right away.
This morning I checked and found that VACUUM ANALYZE
was stuck again for 5 hours. I killed it, but I suspect it is still there somewhere.
Upon further investigation I confirmed that auto_vacuum
is correctly enabled, which means we do not need to run manual VACUUM
but we may need to run ANALYZE
on some or all of the tables.
In my research I found this article: http://rhaas.blogspot.com/2011/03/troubleshooting-stuck-vacuums.html and http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT .
In the end, I have the following questions:
- Is it correct to not run manual VACUUM with auto_vacuum enabled ?
- How can I monitor the progress and performance of auto_vacuum ? How do I know it is not stuck in the same place as the manual VACUUM ?
- Do I still need run ANALYZE on a regular basis ?
- Is there a way to enable automatic ANALYZE, similar to auto_vacuum ?