0

I have a PostgreSQL 11.9 cluster managed by Patroni 3 and I use PgBackrest 2.39 for backups management. This night I started see errors like this on a Production system:

ERROR: [045]: WAL file '00000015000025A70000005D.partial' already exists in the repo1 archive with a different checksum
    2023-04-21 06:03:44.461 GMTLOG:  archive command failed with exit code 45
    2023-04-21 06:03:44.461 GMTDETAIL:  The failed archive command was: pgbackrest --stanza=ccdb --log-level-file=info archive-push pg_wal/00000015000025

In fact, in the data directory I see pg_wal/00000015000025A70000005D.partial. I searched a bit for a solution and I found different threads.

My idea is that the WAL file is corrupted on the file system and it was previously archived with a different checksum. Now the problem started to appear when we migrated an old cluster from Ubuntu 18 to Ubuntu 22 using three new machines (one per node). I suspect that WAL file was correctly archived on the old cluster and during the switch it was partially written on the new one.

I noticed also that on the replica nodes this file doesn't exist. A possible solution could be:

  • switch the leader node
  • check if the new leader correctly archive
  • remove the partial file on the old leader

What do you think? Is it safe this procedure?

Salvatore D'angelo
  • 1,019
  • 3
  • 14
  • 39

0 Answers0