1

I am creating archives of very large directories and splitting these archives in smaller parts as follows:

tar -vcz target_dir | pigz > target_dir.tar.gz

md5sum target_dir.tar.gz > md5sum.txt

split -n 10 target_dir.tar.gz target_dir.tar.gz.part-

The problem is with this approach that I basically need twice the space of the tar.gz file, which is problematic as some of the target directories are huge (TBs).

I could pipe the tar output into split to reduce the required disk space:

tar -vcz target_dir | pigz | split -n 10 - target_dir.tar.gz.part-

But how would I calculate the md5sum of the tar.gz file before it goes into split?

justinian482
  • 845
  • 2
  • 10
  • 18

1 Answers1

4

Use tee to split a stream. Use bash process substitution to run a temporary process with input from a temporary fifo.

tar -vcz target_dir |
    pigz |
    tee >(md5sum > md5sum.txt) |
    split -n 10 - target_dir.tar.gz.part-
KamilCuk
  • 120,984
  • 8
  • 59
  • 111