Postgresql - PANIC: could not open file "pg_replslot/slot_name/state": No such file or directory

Question

Is there any way we could stop replication without logging into psql shell. Disk-full situation lead to some corruption in PG files and keep on restarting.

2023-02-06 08:17:54 UTC [1] LOG:  starting PostgreSQL 13.7 (Ubuntu 13.7-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
2023-02-06 08:17:54 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-02-06 08:17:54 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-02-06 08:17:54 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-02-06 08:17:54 UTC [8] LOG:  database system was shut down at 2023-02-06 08:17:45 UTC
2023-02-06 08:17:54 UTC [8] PANIC:  could not open file "pg_replslot/slot_name/state": No such file or directory
2023-02-06 08:17:55 UTC [1] LOG:  startup process (PID 8) was terminated by signal 6: Aborted
2023-02-06 08:17:55 UTC [1] LOG:  aborting startup due to startup process failure
2023-02-06 08:17:55 UTC [1] LOG:  database system is shut down

Tried removing pg_replslot/slot_name which lead to "password auth failure" and After resetting DB password(via pg_hba.conf) DB is not showing up !

Is there any proper way to recover in this state? /pg/main files and pgdata directories seem to be available except this slot information.

Done below steps:

I'm using PSQL docker container.
disk used for PG got full. Cleaned up some log files and docker system prune was used to remove unused images which freed some space. But lead to this issue.
Multiple times, we have seen similar issue in Dev environments, Disk full leading to some corrupted files (unable to read/ No such file or directory) kind of errors.
Tried removing pg_replslot/slot_name directory and it allowed me to start PSQL container.(previously is was keep on restarting container)
Reset password by using trust in auth column in pg_hbda.conf.
Now \l in psql shell showing only postgres DB and default DB's. Not showing our custom DB.
We have main DB in a separate tablespace and is not showing up in the list.

_ MOST importantly, Standby is also having SAME errors ! Probably someone messed it?

I don't think that a simple disk-full condition leads to this. You have to give us more information: 1) is this the primary or the standby server? 2) What *exactly* did you do after the disk was full and the database crashed? Be as detailed as possible. — Laurenz Albe, Feb 06 '23 at 09:05
@LaurenzAlbe Added more details. docker system prune to remove unused images and this is on primary. But, I think it's easy to reproduce the similar corruption situation by completely utilizing the disk — Anto, Feb 06 '23 at 09:51
Thanks. You write "cleaned up some log files". Can you give me details as to which log files in which directory? — Laurenz Albe, Feb 06 '23 at 10:08
Thanks. I don't know about docker. Looking at the manual page I see that `docker system prune` should not modify any volumes, but seemingly just that happened, as evidenced by the missing `pg_replslot/slot_name/state`. At this point, you should restore your backup. For the future: the correct way to deal with "out of space" conditions is to increase the space. — Laurenz Albe, Feb 06 '23 at 11:27
@LaurenzAlbe Can we recover any data based on tablespace files? Current error was from /pg/xx replication slot named file. — Anto, Feb 06 '23 at 11:27
Only an expert could recover the data now. This beyond a Stackoverflow answer. — Laurenz Albe, Feb 06 '23 at 11:28

Postgresql - PANIC: could not open file "pg_replslot/slot_name/state": No such file or directory

0 Answers0