10

We use pg_dump nightly to make a snapshot of our database. We did for a long time with a simple command

pg_dump -Fc database_name

This takes about an hour and produces a file of 30+GByte.

How can we speed up things?

Janning
  • 1,421
  • 2
  • 21
  • 38

1 Answers1

20

Our biggest table contains raw image data in form of a bytea column.

When we did a simple COPY from psql on this table to stdout, it was quite fast (1 or 2 minutes), but it was very slow with the pg_dump. It took ~60 minutes.

So investigating this I stumbled across this discussion. It seams that the pg_dump compression is rather slow if data is already compressed as it is with image data in a bytea format. And it is better to compress outside of pg_dump (-Z0).

Additionally we found that we can make use of our multi-core cpu (-j10 and pigz -p 10 to use 10 cores. you might select a different number of cores).

So now we are doing it like this:

$ pg_dump -Z0 -j 10 -Fd database_name -f dumpdir
$ tar -cf - dumpdir | pigz -p 10 > dumpdir.tar.gz
$ rm dumpdir

The time has dropped from ~70 minutes to ~5 minutes. Quite amazing.

You can restore it like this:

$ mkdir -p dumpdir
$ pigz -p 10 -dc dumpdir.tar.gz | tar -C dumpdir --strip-components 1 -xf -
$ pg_restore -j 10 -Fd -O -d database_name dumpdir
Janning
  • 1,421
  • 2
  • 21
  • 38
  • 1
    We had similar problem. 2 days spent for pg_dump and i canceled however i need to take dump. we have 3 TB database and more than 1,5 TB is bytea image table. The time could dropped in big tables? – Melih Mar 22 '22 at 14:45
  • Yes, I guess it will help a lot (= hours) – Janning Mar 25 '22 at 10:59
  • How do you restoring db from this? `pg_restore -d root -j 10 -Fd dump_2022-03-24_17_47_40.tar.gz` returns me `pg_restore: error: could not open input file "dump_2022-03-24_17_47_40.tar.gz/toc.dat": Not a directory` – plancys Mar 25 '22 at 20:10
  • @Janning 2 days spent. now with -j 2 6.5 hours. i hope there will be less time with -j 10... thanks – Melih Mar 28 '22 at 06:47
  • 1
    The problem is not about -j but important is -Z0 to disable compression and use -Fd to dump it into a directory. -j is just telling the process how many CPUs it should use. If you have plenty use them all. – Janning Mar 28 '22 at 07:12
  • @Janning the pg_dump tip helps a lot however I'm doing a dump from a remote server and hence the compression is not working because will do it locally after pg_dump is done. Wanted to know how can I execute the pigz compression in the remote server to then download only the compressed file? – Gabriel Linassi Dec 23 '22 at 03:30
  • I tried this approach and it **didn't work** for me. The backed up DB was a mess so the speed saved wasn't really worth it. I wish I could remove my upvote, I lost a lot of time on this approach – Jose Paez Apr 11 '23 at 22:18
  • It is working for me for years without any problems. I guess your backup script is wrong. Just post a new question on stack overflow with your script and link it here. – Janning Apr 12 '23 at 06:36
  • 1
    @JosePaez Maybe the problem was the -j option. I read in the docs: _pg_dump -j uses multiple database connections; [..]. Without the synchronized snapshot feature, the different worker jobs wouldn't be guaranteed to see the same data in each connection, which could lead to an **inconsistent backup**._ – Markus May 11 '23 at 06:39
  • Works like a charm, tested on ~30GB database – Alexandr Tovmach Jun 05 '23 at 20:48
  • 1
    Great answer, thanks. For completeness sake, I'd add the "-p10" parameter to pigz, where 10 is the number of cores you want to limit it too. By default, it uses all cores, which may or may not be what you want. – pkExec Jun 08 '23 at 09:22