2

In my Aurora Postgres server I continuously see a vacuum timeout every minute for the following:

autovacuum: VACUUM pg_catalog.pg_statistic 

I tried doing it manually and got following output:

INFO:  vacuuming "pg_catalog.pg_statistic"
INFO:  "pg_statistic": found 0 removable, 409109 nonremovable row versions in 19981 out of 19990 pages
DETAIL:  408390 dead row versions cannot be removed yet, oldest xmin: 4230263474
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 9 frozen pages.
0 pages are entirely empty.
CPU: user: 0.06 s, system: 0.00 s, elapsed: 0.07 s.
INFO:  vacuuming "pg_toast.pg_toast_2619"
INFO:  "pg_toast_2619": found 0 removable, 272673 nonremovable row versions in 61558 out of 61558 pages
DETAIL:  219 dead row versions cannot be removed yet, oldest xmin: 4230263474
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.14 s, system: 0.00 s, elapsed: 0.14 s.
VACUUM

Query returned successfully in 5 secs 55 msec.

Anyone can point to the reason why this is happening?

enter image description here

adnan kamili
  • 8,967
  • 7
  • 65
  • 125
  • Is there any timeout log from your AWS Aurora? – D-Shih Jun 09 '22 at 05:42
  • @D-Shih I have attached the image – adnan kamili Jun 09 '22 at 05:52
  • 1
    Must be one of [these reasons](https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/). Fix the problem and try again. – Laurenz Albe Jun 09 '22 at 06:37
  • @LaurenzAlbe It was none of the three. The long-running process is pgadmin, no orphaned prepared transactions and no abandoned replication slots – adnan kamili Jun 09 '22 at 09:36
  • Then it was the first reason, a long running transaction. Make sure *not* do disable autocommit in pgAdmin! – Laurenz Albe Jun 09 '22 at 11:33
  • Calling this a "timeout" is an Aurora-invented weirdness. This is just the vacuum throttling itself to avoid competing too much with your "foreground" processes. It does not indicate a problem. – jjanes Jun 09 '22 at 14:31
  • @LaurenzAlbe We don't run pgadmin all the time, but the vacuum timeout is continuous. I also tried running VACUUM FULL, that didn't solve the problem either. – adnan kamili Jun 10 '22 at 05:38
  • @jjanes I am not concerned with continuous timeout. The concern is vacuum is unable to delete 408390 dead tuples even on VACUUM FULL. – adnan kamili Jun 10 '22 at 05:39
  • Neither `VACUUM` nor `VACUUM (FULL)` can solve the problem as long as you are holding the transaction open. – Laurenz Albe Jun 10 '22 at 06:01
  • @LaurenzAlbe I killed the pgadmin process, but still I am seeing the timeouts in AWS. – adnan kamili Jun 10 '22 at 09:05
  • That would be cool, according to what jjanes said above (I know nothing about Aurora). The important point is that `VACUUM (VERBOSE)` shows that dead tuples get cleaned up. – Laurenz Albe Jun 10 '22 at 09:20
  • You say you aren't concerned with the timeouts, but with the dead but nonremovable tuples. But then you keep reporting the timeouts are still here. But are the dead but nonremovable tuples still there? You killed one process with a long-open transaction, but there might be more, so go back the re-assess that. – jjanes Jun 11 '22 at 16:51

1 Answers1

3

I think your autovacuum might work as well, Timeout:VacuumDelay might only be a metric From AWS

A process is waiting in a cost-based vacuum delay point.

The most similar setting is autovacuum_vacuum_cost_delay which exists in PostgreSQL.

That want to slow down autovacuum. autovacuum will sleep for these many milliseconds when a cleanup reaching autovacuum_vacuum_cost_limit cost is done.

We can try to use pg_stat_user_tables to verify the latest time of autovacuum.

SELECT
  schemaname, relname,
  last_autovacuum,
  last_autoanalyze
FROM pg_stat_user_tables;
D-Shih
  • 44,943
  • 6
  • 31
  • 51