1

FYI only; this does not need an answer.

I was working on a Postgres server under heavy load, and issued a GRANT command that hung. It was not blocked by any other commands. I had a few open connections and was able to kill several of the processes with a normal pg_cancel_backend (SIGTERM) command, but my GRANT command didn't respond to either that or pg_terminate_backend (SIGINT). I finally tried "kill -9 (pid)" (SIGKILL) and the server crashed.

Issuing SIGKILL to the database server process or the postmaster can cause crashes--that's well documented. Running SIGKILL against a child process can also crash the database.

Vi.
  • 37,014
  • 18
  • 93
  • 148
David Willis
  • 183
  • 3
  • 8
  • 1
    `GRANT` without any blocking locks should certainly not "hang". How did you determine that it wasn't waiting for locks? What query against `pg_stat_activity` / `pg_locks`? – Craig Ringer Jun 07 '14 at 05:43
  • `kill -KILL` or `kill -9` is a bad habit, only used by beginners. BTW: maybe you forgot to add a semicolon after the `GRANT ... TO ... ` ? – wildplasser Jun 07 '14 at 12:18

1 Answers1

3

Running SIGKILL against a child process can also crash the database

Any fatal signal that terminates any backend without a chance to clean up, such as SIGSEGV, SIGABRT, SIGKILL, etc, will cause the postmaster to assume that shared memory may be corrupt. It will roll back all transactions, terminate all running backends, and restart.

PostgreSQL does that to protect your data. If something went wrong before a backend crashed that caused it to scribble on shared memory, then shared_buffers could contain invalid data that'd get flushed to disk and replace good pages.

I was pretty sure that was in the docs, but all I can find is what I think you were referring to in shutting down the server.

Anyway, if you SIGKILL a backend you'll see something like:

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

This also happens if the OOM killer kills a backend, which is why you should turn off memory overcommit on Linux.

I wrote some guidance on things to do and not to do with PostgreSQL on my blog. Worth a look.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778