8

I'm trying to write a parallel compress / encrypt backup script for archiving using GNU parallel, xz and GnuPG. The core part's of script is:

tar --create --format=posix --preserve-permissions --same-owner --directory $BASE/$name --to-stdout . \
    | parallel --pipe --recend '' --keep-order --block-size 128M "xz -9 --check=sha256 | gpg --encrypt --recipient $RECIPIENT" \
    | pv > $TARGET/$FILENAME

Without GnuPG encryption, it works great (uncompress and untar works), but after adding parallel encryption, it's fail to decrypt with below error:

[don't know]: invalid packet (ctb=0a)
gpg: WARNING: encrypted message has been manipulated!
gpg: decrypt_message failed: Unexpected error
: Truncated tar archive
tar: Error exit delayed from previous errors.

Because uncompressed size is as same as gnu parallel's block size(around 125M), I assume that it's related GnuPG's support of partial block encryption. How can I solve this problem?


FYI

Another parallel gpg encrption issue about random number generation

https://unix.stackexchange.com/questions/105059/parallel-pausing-and-resuming

Yongbin Yu
  • 108
  • 1
  • 8
  • The best single thing you can do here is pass `-z 0` to gpg to stop it trying to recompress the output of xz. That will likely change your job to IO-bound and remove the need for GNU parallel. It did that for me, but I'll note I was using zstd rather than `xz -9`. – Dzamo Norton Nov 19 '20 at 07:36

3 Answers3

9

Pack

tar --create --format=posix --preserve-permissions --same-owner --directory $BASE/$name --to-stdout . |
    parallel --pipe --recend '' --keep-order --block-size 128M "xz -9 --check=sha256 | gpg --encrypt --recipient $RECIPIENT;echo bLoCk EnD" |
    pv > $TARGET/$FILENAME

Unpack

cat $TARGET/$FILENAME |
  parallel --pipe --recend 'bLoCk EnD\n' -N1 --keep-order --rrs 'gpg --decrypt | xz -d' |
  tar tv

-N1 is needed to make sure we pass a single record at a time. GnuPG does not support decrypting multiple merged records.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • @Old Tange : I didn't know about `--rrs`. it's good idea using custom delimiter. one thing I can't get it in your command is `-N1` option. what this option do for in this context? man page said *When used with --pipe -N is the number of records to read. This is somewhat slower than --block.* – Yongbin Yu Sep 17 '17 at 17:12
  • thx for quick answer. In just curiosity If I miss `-N1` option, parallel pass multi records to pipe? – Yongbin Yu Sep 17 '17 at 17:21
  • 1
    Adding an argument of `-z 0` to gpg in the packing command will save many wasted CPU cycles by stopping it from trying to recompress the output of xz. – Dzamo Norton Nov 19 '20 at 07:59
7

GnuPG does not support concatenating multiple encryption streams and decrypting them at once. You will have to store multiple files, and decrypt them individually. If I'm not mistaken, your command even mixes up the outputs of all parallel instances of GnuPG, so the result is more or less random garbage.

Anyway: GnuPG also takes care of compression, have a look at the --compression-algo option. If you prefer to use xz, apply --compression-algo none so GnuPG does not try to compress the already-compressed message again. Encryption has massive support by CPU-instructions ourdays, xz -9 might in fact be more time intensive than encryption (although I did not benchmark this).

Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • wow! great answer and question edit thx. that's what I want to know. and I didn't know about gpg's care of compression I'll try `--compression-algo none` on my job. as you mention. xz -9 on couple of gigabyte is a huge process than gpg encrption but gpg process is also not negligible time. anyway I'll report both tests execution time. – Yongbin Yu Sep 17 '17 at 09:04
  • 1
    Maybe also look at Facebook's new zstd algorithm if you can get software new enough to be supported, some benchmarks claim superior CPU load on a competitive compression ratio. Maybe not reaching `xz -9`, but achieved with much smaller computational overhead. – Jens Erat Sep 17 '17 at 09:47
  • yes I already take some time to apply facebook's brand new zstd algoritm on that pipe process but zstd's process creation behavior looks different from xz, gzip's. So I success to create parallely compressed zstd arhive file but it's compress ratio and time is worse than old `xz -9` setting. It looks need to more research. – Yongbin Yu Sep 17 '17 at 12:45
  • option of setting compress level is `--compress-level` at my gpg (1.4.18 / Debian) there is no option named in `--compression-algo` so I just set `--compress-level 0` at my script – Yongbin Yu Sep 17 '17 at 15:35
  • here is simple benchmark numbers. ( 11.9GiB tar.gz.xz ) 1. tar + parallel xz + single gpg with compression : 3022 sec 2. tar + parallel xz + single gpg without compression : 2550 sec (15.6% faster than original) 3. tar + parallel xz and gpg without compression : 2873 sec ( 4.9% faster than original but *NOT WORKS*) third command shows high variance of time by getting random_seed condition. Finally I choose tar + parallel xz + single gpg without compression combination. – Yongbin Yu Sep 17 '17 at 15:36
0

that's mainly a gpg issue. gpg does not support multithreading and probably never will. you can search the web about the why.


it even got worse with gpg v2: you cannot even run multiple gpg v2 instances in parallel because they all lock the gpg-agent which is now doing all the work........ maybe we should look for an alternative when doing mass encryption.

https://answers.launchpad.net/duplicity/+question/296122

EDIT: No. It is possible to run multiple gpg v2 instances at the same time, without any problem with the gpg-agent.

Community
  • 1
  • 1
Yongbin Yu
  • 108
  • 1
  • 8
  • Technically, this talks about another issue: OpenPGP specifies use of a CFB mode variant, which does not support multi-threaded encryption, so you'd have to start multiple individual encryption streams (which you already did). The second quote would be something that could be solved by running multiple threads or instances, of course (which is a limitation of GnuPG, if not supported). – Jens Erat Sep 17 '17 at 09:44