Howdy Slack Overflowvians.
So I came across this PostgreSQL server running 8.3.11 (yeah I know), that was in a locked state with:
ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and use a standalone backend to vacuum that database.
Normally the auto vaccum daemon (autovacuum=on
), would handle this, but because the following four TOAST (allows storage of large field values 8 kB slices, like bread), database object. But the XID of this database never was reset because of these corrupt database objects.
Below is a snippet of the output when running the server in single-user mode with the admin user:
SELECT oid, relname, age(relfrozenxid) FROM pg_class WHERE relkind = 't' ORDER BY age(relfrozenxid) DESC LIMIT 4;
----
1: oid = "2421459" (typeid = 26, len = 4, typmod = -1, byval = t)
2: relname = "pg_toast_2421456" (typeid = 19, len = 64, typmod = -1, byval = f)
3: age = "2146484084" (typeid = 23, len = 4, typmod = -1, byval = t)
----
1: oid = "2421450" (typeid = 26, len = 4, typmod = -1, byval = t)
2: relname = "pg_toast_2421447" (typeid = 19, len = 64, typmod = -1, byval = f)
3: age = "2146484084" (typeid = 23, len = 4, typmod = -1, byval = t)
----
1: oid = "2421435" (typeid = 26, len = 4, typmod = -1, byval = t)
2: relname = "pg_toast_2421432" (typeid = 19, len = 64, typmod = -1, byval = f)
3: age = "2146484084" (typeid = 23, len = 4, typmod = -1, byval = t)
----
1: oid = "2421426" (typeid = 26, len = 4, typmod = -1, byval = t)
2: relname = "pg_toast_2421423" (typeid = 19, len = 64, typmod = -1, byval = f)
3: age = "2146484084" (typeid = 23, len = 4, typmod = -1, byval = t)
Notice the age is well above the vacuum_freeze_min_age
(value set after a successful VACUUM), on this server and thus why it was issuing the original errors above. The above was AFTER running a VACUUM FULL
; all other tables fine.
SELECT relfilenode FROM pg_class WHERE oid=2421459;
So when we looked on disk (used the pg_class.relfilenode value for each table above) the toast table's file was missing:
$ find /var/lib/pgsql/data/ -type f -name '2421426' | wc -l # Bad toast
0
and when we looked on disk at the index of the toast
SELECT relfilenode FROM pg_class WHERE (select reltoastidxid FROM pg_class WHERE oid=2421459)
$ find /var/lib/pgsql/data/ -type f -name '2421459' | wc -l # Bad toast's index
0
We then tried to find the table that the bad toast record is related to with:
SELECT * FROM pg_class WHERE reltoastrelid=2421459;
got 0 results for each table above! There are no tables for the VACUUM
command to reset the XID of these relations.
When we checked the pg_depend table and found that these TOAST tables have NO references:
SELECT * FROM pg_depend WHERE refobjid IN(2421459,2421450,2421435,2421426)
Question
- Can you delete the bad TOAST table and TOAST table indexes from the
pg_class table (e.g.
DELETE FROM pg_class where oid=2421459
) - Are there any other tables that we also need to remove the relation from?
- Could we just create a temp table and link it to the TOAST's index's oid?
Example for #3 above:
CREATE TABLE adoptedparent (colnameblah char(1));
UPDATE pg_class SET reltoastrelid=2421459 WHERE relname='adoptedparent';
VACUUM FULL VERBOSE adoptedparent
EDIT:
select txid_current()
is 3094769499 so these tables were corrupted a long time ago. We don't need to recover the data. We are running ext4 file system on Linux 2.6.18-238.el5. We checked the relevant lost+found/
directories and the files were not there.