22

I'm getting an Error

ERROR:  invalid page header in block 411 of relation "t_value_time"

in my PostgreSQL database. This keeps happening on different machines. Is there a way to prevent it from happening, or at least telling PSQL to ignore the data on the invalid block and move on?

I'd rather lose the data from the block and have him skip over it, reading the rest of the data. Is there a way to tell PSQL to skip this block?

pb2q
  • 58,613
  • 19
  • 146
  • 147
MrBubbles
  • 527
  • 2
  • 5
  • 7

5 Answers5

35

WARNING: You will lose some data!

We managed to get over it (crashed DEV VM) by issuing:

database=# SET zero_damaged_pages = on;
SET
database=# VACUUM FULL damaged_table;
WARNING: invalid page header in block xxx of relation base/yyy/zzz; zeroing out page
[..]
REINDEX TABLE damaged_table;

Fix via pwkg.ork.

Vlad
  • 10,602
  • 2
  • 36
  • 38
  • 3
    This is the right solution. The `VACUUM FULL` and `REINDEX` isn't strictly necessary to clear the warning. Any operation that reads the page from disk and triggers the "invalid page header" error will fix the damaged page (by re-writing it with all zeroes) if `zero_damaged_pages` is on, so you could just do a query that hits the bad page. – simonp Jun 25 '13 at 22:06
  • 1
    @simonp this is not quite true. You can read at [PostgreSQL docs](https://www.postgresql.org/docs/14/runtime-config-developer.html#GUC-ZERO-DAMAGED-PAGES) that "Zeroed-out pages are not forced to disk so it is recommended to recreate the table or the index before turning this parameter off again". I believe a `VACUUM FULL` would help though, and `REINDEX` _may_ be required. – girgen Jan 19 '22 at 21:39
2

Same block every time?

From what I've read, the most common cause of invalid blocks is hardware. Red Hat has a utility, pg_filedump, that formats "PostgreSQL heap, index, and control files into a human-readable form". I don't think they support any PostgreSQL version greater than 8.4.0, but I could be wrong.

You want to prove your hardware is good by using tough, thorough disk, RAM, and NIC diagnostics.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
1

There's no simple way to do it, but it's reasonably easy to do just by editing the data file directly (relfilenode of the pg_class entry gives the filename).

Just copy a block from elsewhere in the file over the bad block. Ideally, synthesise an empty block or update the one you're overwriting to have no valid tuples in it.

Once you've got something that doesn't produce that error, dump the table and reload it for safety.

araqnid
  • 127,052
  • 24
  • 157
  • 134
0

these are almost always hardware problems btw. Verify and test RAM, disk, CPU. Make sure your environment is good (bad power input can cause problems as can overheating). That's the best way to prevent it. Best way to address it is point in time recovery from a base backup.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182
0

If you have a slave, set hot_standby_feedback to 'on' on it if not already. Do pg_dump and write it to /dev/null so that you don't consume any space. nohup pg_dump db_name -v -Fc -f /dev/null & If the dump succeeds, then your slave is fine. Do a failover. There will be no data loss.

Another way to validate your slave is to do, explain select count(*) from table_name; If it succeeds and if it is using a sequence scan, then your slave is good. You may not have to consider this option if it is using index scan.

Note: This works only if your master is affected with storage level corruption.

I happened to face same issue just today and i was able to fix it.