3

We are not able to connect to a PostgreSQL instance and we have to restart the PostgreSQL service to connect to the instance, but we start getting the same error within a few days.

Logs show below error

2019-02-18 08:00:03.043 UTC [5053] postgres@utilitydbLOG:  process 5053 still waiting for AccessShareLock on relation 2676 of database 0 after 1000.431 ms
2019-02-18 08:04:04.486 UTC [5170] bucardo@bucardoLOG:  process 5170 still waiting for AccessShareLock on relation 2676 of database 0 after 1001.006 ms
2019-02-18 08:15:02.802 UTC [5445] postgres@utilitydbLOG:  process 5445 still waiting for AccessShareLock on relation 2676 of database 0 after 1000.425 ms
2019-02-18 08:30:02.844 UTC [5846] postgres@utilitydbLOG:  process 5846 still waiting for AccessShareLock on relation 2676 of database 0 after 1000.501 ms
2019-02-18 08:33:23.428 UTC [5940] bucardo@bucardoLOG:  process 5940 still waiting for AccessShareLock on relation 2676 of database 0 after 1001.265 ms
2019-02-18 08:45:02.869 UTC [6269] postgres@utilitydbLOG:  process 6269 still waiting for AccessShareLock on relation 2676 of database 0 after 1000.340 ms
2019-02-18 08:51:08.230 UTC [6478] dbuser@postgresLOG:  process 6478 still waiting for AccessShareLock on relation 2676 of database 0 after 1000.438 ms

When i check service status it shows below details

postgres: postgres utilitydb [local] startup waiting
postgres: postgres utilitydb [local] startup waiting
postgres: postgres utilitydb [local] startup waiting
postgres: postgres utilitydb [local] startup waiting
postgres: postgres utilitydb [local] startup waiting
postgres: postgres utilitydb [local] startup waiting
postgres: postgres utilitydb [local] startup waiting
postgres: postgres utilitydb [local] startup waiting
postgres: postgres utilitydb [local] startup waiting

I know I need to check for locks in the database but the information provided in logs is not so detailed. There is no such database with oid = 0 in PostgreSQL system tables.

How can I debug this issue?

Any help would be appreciated.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
YogeshR
  • 1,606
  • 2
  • 22
  • 43

1 Answers1

7

Something is holding an ACCESS EXCLUSIVE lock on the pg_authid_rolname_index index.

Nobody can use that index while this lock is held, and since this is the index on the table of users (pg_authid) that you need to search for a certain user name, no login can succeed.

You have fallen prey to a “denial of service” attack.

If there is an active database session holding the lock, kill it.

If the lock persists even if you restart the database, it must be a prepared transaction. Shut down the database and clean out the pg_twophase subdirectory of your data directory.

Secure your database better and change all superuser passwords.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Could you please mention the name of the table of users? – YogeshR Feb 20 '19 at 07:17
  • 1
    I have added it to my answer. – Laurenz Albe Feb 20 '19 at 07:20
  • We run a maintenance job which reindexes and vacuum all tables (including system tables) of a database. is it possible that it can be the reason of ACCESS EXCLUSIVE lock on the pg_authid_rolname_index ? – YogeshR Feb 20 '19 at 07:34
  • Yes, that could explain the problem. But if a `REINDEX` on that index takes more than a second, there is something wrong. How many users do you have? Why do you do that anyway? – Laurenz Albe Feb 20 '19 at 08:01
  • We have 30 databases on an instance and 2 admin users and 2 normal users. Actually, we are reindexing all tables because we have a purge job which purges data older than a month. Its run every week. I think we have to specify particular tables during reindexing. – YogeshR Feb 20 '19 at 13:47
  • At least don't reindex your whole database in one transaction so that thr system tables are not locked for such a long time. If you don't delete more than 30-50% of your table, don't worry about the bloat. The space will be reused. – Laurenz Albe Feb 20 '19 at 16:58
  • I am getting an error "process 24193 still waiting for ExclusiveLock on tuple (4,43) of relation 16537 of database 16388 after 1007.074 ms". How can i get exact row which is causing this lock. – YogeshR Apr 22 '19 at 13:49
  • 1
    `SELECT * FROM tablename WHERE ctid = '(4,43)';` The table name can be found with `SELECT relname FROM pg_class WHERE oid = 16537;` – Laurenz Albe Apr 22 '19 at 17:04
  • I got a strange error in my testing server. When I see logs I find out this error "process 24193 still waiting for ExclusiveLock on tuple (4,43) of relation 16537 of database 16388". A cron runs in 15 minutes and connect to the database to gather some useful statistics. All cron instances were continued waiting due to a lock held by some process and thus all available connections were occupied and i was not able to connect to database and then i have to restart server. How can i find out the process who were holding lock on the table at the first place? – YogeshR Apr 24 '19 at 11:53
  • It is the backend for whom all the others are waiting in `pg_locks`. – Laurenz Albe Apr 24 '19 at 13:16
  • backend means what it can be. Can it be a full vacuum which runs in off hours or anything else? I need to reproduce this issue. How can I do so? Thanks in advance. – YogeshR Apr 24 '19 at 13:41
  • You just need to query `SELECT locker.pid FROM pg_locks AS locked JOIN pg_locks AS locker USING (virtualtransaction) WHERE NOT locked.granted AND locker.locktype = 'virtualxid';`. – Laurenz Albe Apr 24 '19 at 13:49
  • I had to restart the server. So now all locks are gone. But how can i reproduce the same? – YogeshR Apr 24 '19 at 14:16
  • Lock a row that another transaction tries to lock. – Laurenz Albe Apr 24 '19 at 14:17