2

I've built a script to perform a physical backup of a Postgresql 9.0.8 database by following the "Standalone hot physical database backup" recipe in the PostgreSQL 9 Administration Cookbook (Riggs/Krosing), but I adapted it for Windows Server 2008 R2.

For the recipe's step #4 which uses rsync to copy all of the data files (excluding pg_xlog directory) I am using robocopy.exe (since rsync is a *nix utility and I am using Windows). The problem is that often, one of the files cannot be copied and results in "Access is denied". Copying the file by hand fails with "Access is denied" long after the backup script has failed - so this is not some intermittent issue that can be re-tried. The file can be copied only after restarting the PostgreSQL process. It's always a different file. Last nite it was %PGDATADIR%\5432\base\24609\38122 .

I'd like to hear if you've experienced this and what you did to fix this. I'm considering:

  1. Restart the PostgreSQL server right before backup (I admit this is a hack)
  2. Using some sort of utility that can copy open files such as VSHADOW, DISKSHADOW and hobocopy (note: not robocopy)
  3. Maybe there is some way to instruct PostgreSQL to release all locks?
  4. [added] see below - looks like adding regular "vacuum"ing eliminates the symptom
sevzas
  • 233
  • 3
  • 11

1 Answers1

1

OK, first things first - put away your cookbook. Instead go read the section of the Postgres manual about backing up. Read the whole chapter - it isn't that long.
(You'll probably notice some similarities between this and the book - most Postgres books are just prettied-up versions of the manual - but you should always work from the manual as your primary reference.).

All of the terminology I will use below is from the manual (so if you thought you could skip the reading assignment you can't - if you do you are likely to be left horribly confused).


Now for your actual problem -- A Unix solution is often not directly portable to Windows, and this is one of those cases: A *nix system will happily grab a file that is being manipulated - Windows throws the error you're seeing.

How you deal with this depends on what kind of backup you're doing.

Filesystyem Level Backups

If you're doing a "filesystem level backup" you must shut the server down. Full stop, end of discussion, no other options. The database must be shut down completely in order for that type of backup to be reliable (and if the backup you're getting isn't reliable what's the point?).

Continuous Archiving / Point-in-Time Recovery & Slave Servers

If you're doing a base backup as part of setting up Point-in-Time Recovery & log-shipping you have two options:

  • Shut the server down anyway.
  • Use a tool that can copy open files (option (2) from your question)

You then proceed as per the rest of the documentation for Point-in-Time Recovery / Log Shipping, creating a slave server.
When you want to copy your database server to disk simply stop the slave, back it up, and restart it -- the world keeps turning on your master server, and the slave will catch up on what it missed when it restarts.

You can also use the base backup plus the rolled transaction logs that you would normally ship to the slave as a good reliable database backup. This would seem to be the closest thing to what you're trying to achieve in your question, but I would recommend the slave backup I described instead -- More benefits (you have a hot standby) and less work for the master server (no extra checkpointing to roll the backup's transaction log).

Something Else

If none of the above appeals to you you're pretty much stuck using SQL Dumps.
There are downsides: Postgres has to lock each table as it gets dumped (which means writes to your database will block), and SQL dumps are slower than the other options.
If your database is of any real size I would not advise this method.

voretaq7
  • 79,879
  • 17
  • 130
  • 214
  • Very helpful! I am indeed using the procedure in section 24.3.2. I'm not able to follow your comments when you start talking about the "slave server". Which step(s) create a slave server? Are you taling about setting up replication? Maybe I should expose the fact that I have streaming replication in place right now. Should I just stop the slave and back it up instead of touching the master? – sevzas Aug 24 '12 at 16:15
  • If you already have streaming replication in place then stopping the slave and backing that up is exactly what I would suggest. (If you want to get fancy and minimize slave downtime/lag: Stop the server, make a VSS snapshot, restart the server, and back up the snapshot.) – voretaq7 Aug 24 '12 at 17:55
  • Thanks for clarifying that. I want to add one thing that I noticed: I've determined that the frequency of the symptom I described in my original question is proportional to the length of time that the database has gone without being vacuumed. In my case, the database had never been vacuumed since it was created (for over 1 year). The problem began to occur sporadically after about 3 months and got worse. After 1 year, the problem occurred every time I tried to back up. Since I put in a weekly vacuum process I haven't seen this problem - it's been about 2 weeks now. – sevzas Aug 29 '12 at 16:30
  • @SevZ. This may have to do with the free space map files Postgres generates then -- it's probably holding write (or exclusive) locks on them (just a guess - I could be completely wrong). In any case if you're doing FS level backups you want to make sure the DB is shut down, otherwise you might get half-consistent backups (where the WAL is older than the transactions in the table) -- Bad News all around if that happens. – voretaq7 Aug 29 '12 at 16:34
  • I spoke too soon on the work-around involving vacuum command. The robocopy.exe step failed again. – sevzas Aug 31 '12 at 12:03