2

I've encountered a problem querying some of my tables recently. When I try to select data I get an ERROR telling: ERROR: invalid memory alloc request size 4294967293. This generally indicates data corruption. There is a nice and precise technique of how to delete corrupted rows described here: https://confluence.atlassian.com/jirakb/invalid-memory-alloc-request-size-440107132.html
But, since I have lots of corrupted tables, this method is too slow. So, I've found a nice function which returns the last successful ctid here: http://blog.dob.sk/2012/05/19/fixing-pg_dump-invalid-memory-alloc-request-size/

Looking for corrupted row is a bit faster when using it, but not fast enough. I slightly modified it to store all "last successful ctid" in a different table and now it looks like this:

CREATE OR REPLACE FUNCTION
find_bad_row(tableName TEXT)
RETURNS void
as $find_bad_row$
DECLARE
result tid;
curs REFCURSOR;
row1 RECORD;
row2 RECORD;
tabName TEXT;
count BIGINT := 0;
BEGIN
DROP TABLE IF EXISTS bad_rows_tbl;
CREATE TABLE bad_rows_tbl (id varchar(255), offs BIGINT);
SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName;

OPEN curs FOR EXECUTE 'SELECT ctid FROM ' || tableName;

count := 1;

FETCH curs INTO row1;

WHILE row1.ctid IS NOT NULL LOOP
    BEGIN
    result = row1.ctid;

    count := count + 1;
    FETCH curs INTO row1;

    EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM '
    || tableName || ' WHERE ctid = $1' INTO row2
    USING row1.ctid;

    IF count % 100000 = 0 THEN
    RAISE NOTICE 'rows processed: %', count;
    END IF;
    EXCEPTION
    WHEN SQLSTATE 'XX000' THEN
        RAISE NOTICE 'LAST CTID: %', result;
        EXECUTE 'INSERT INTO bad_rows_tbl VALUES(' || result || ',' || count || ')';
    END;

END LOOP;

CLOSE curs;

END
$find_bad_row$
LANGUAGE plpgsql;

I'm quite new to plpgsql, so I'm stuck with the following question: how to query not pre-unsuccessful ctid, but the exact unsuccessful one (or calculate the next one from pre-unsuccessful) so I could insert it into bad_rows_tbl and use as an argument for a DELETE statement further?

Hope for some help...

UPD: a function I ended up

CREATE OR REPLACE FUNCTION
find_bad_row(tableName TEXT)
RETURNS tid[]
as $find_bad_row$
DECLARE
result tid;
curs REFCURSOR;
row1 RECORD;
row2 RECORD;
tabName TEXT;
youNeedMe BOOLEAN = false;
count BIGINT := 0;
arrIter BIGINT := 0;
arr tid[];
BEGIN
CREATE TABLE bad_rows_tbl (id varchar(255), offs BIGINT);
SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName;

OPEN curs FOR EXECUTE 'SELECT ctid FROM ' || tableName;

count := 1;

FETCH curs INTO row1;

WHILE row1.ctid IS NOT NULL LOOP
    BEGIN
    result = row1.ctid;
    count := count + 1;

    IF youNeedMe THEN
        arr[arrIter] = result;
        arrIter := arrIter + 1;     
        RAISE NOTICE 'ADDING CTID: %', result;
        youNeedMe = FALSE;
    END IF;

    FETCH curs INTO row1;

    EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM '
    || tableName || ' WHERE ctid = $1' INTO row2
    USING row1.ctid;

    IF count % 100000 = 0 THEN
        RAISE NOTICE 'rows processed: %', count;
    END IF;
    EXCEPTION
    WHEN SQLSTATE 'XX000' THEN
        RAISE NOTICE 'LAST GOOD CTID: %', result;
        youNeedMe = TRUE;
    END;

END LOOP;

CLOSE curs;
RETURN arr;
END
$find_bad_row$
LANGUAGE plpgsql;
Anton Zvonovsky
  • 313
  • 2
  • 6
  • 16
  • 1
    How'd you get into the situation in the first place? Is point-in-time recovery (barman, etc) after restore from backup not an option? – Craig Ringer Mar 16 '16 at 10:26
  • 1
    Unfortunately, since the 'archive_mode' command in postgresql.conf is commented, I guess, point-in-time recovery is not available. That's why I'm trying to fix the contents. – Anton Zvonovsky Mar 16 '16 at 12:28
  • 1
    I've modified a quastion and added a function I ended up. It returns an array of tid's of a corrupted rows. It can be used at a WHERE clause of a DELETE statement. I've already removed multiple broken rows using it. – Anton Zvonovsky Mar 16 '16 at 13:27
  • 2
    As soon as you get the DB into a dump-able state, dump it, shut it down, archive the old one and re-initdb. Do NOT continue using it. – Craig Ringer Mar 17 '16 at 04:49
  • Thanks for your reply! Do I understand right, that I need to make a new DB from that dump? – Anton Zvonovsky Mar 17 '16 at 09:07
  • Not just a new DB. A new PostgreSQL data directory. Any corruption like this means the whole PostgreSQL instance (collection of databases) cannot be trusted. Dump _all_ databases, `pg_dumpall --globals-only` your users, roles, etc. `initdb` again. You should keep the old one to investigate how this happened - preferably the original copy before you tried modifying it, like you hopefully took based on the advice at https://wiki.postgresql.org/wiki/Corruption . Unless you did something known to be unsafe like setting fsync=off, in which case you should've had better backups. – Craig Ringer Mar 17 '16 at 14:50

1 Answers1

1

This is supplemental to the function given in the question and answers next steps after the db is dumpable.

Your next steps should be:

  1. dumpall and restore on a physically different system. The reason being at this point we don't know what caused this and chances are not too bad that it might be hardware.

  2. You need to take the old system down and run hardware diagnostics on it, looking for problems. You really want to find out what happened so you don't run into it again. Of particular interest:

    • Double check ECC RAM and MCE logs
    • Look at all RAID arrays and their battery backups
    • CPUs and PSUs
    • If it were me I would also look at environmental variables such as AC in and datacenter temperature.
  3. Go over your backup strategy. In particular look at PITR (and related utility pgbarman). Make sure you can recover from a similar situation in the future if you run into it.

Data corruption doesn't just happen. In rare cases it can be caused by bugs in PostgreSQL, but in most cases it is due to your hardware or due to custom code you have running in the back-end. Narrowing down the cause and ensuring recoverability are critical going forward.

Assuming you aren't running custom C code in your database, most likely your data corruption is due to something on the hardware

Chris Travers
  • 25,424
  • 6
  • 65
  • 182