1

I want to transfer a multi-terabyte directory to an nfs mounted directory most efficiently over a 1Gbit network (probably the limiting factor)

3 options -

  1. tar and compress in place, then copy
  2. copy, then tar and compress
  3. tar | compress

Seems obvious to me that #3 should be most efficient since I'm only reading and writing the data one time. Unfortunately, my command (tar -c dir | pigz > /mnt/nfs/dir.tgz ) seems to tar for a while, then zip for a while, then tar for a while... and the network goes idle for large chunks of time, then the cpu is idle.

Did i miss some option?

P.S. My question seems related to this question but that has no answer and doesn't really ask the precise question about alternation between network and cpu saturation.

1 Answers1

1

You might be forgetting the fact that in UNIX/Linux a process can only do only a single BLOCKING I/O operation at a time. There is no concurrent read or write operations contained in either the tar or the compress functions. Nor is there any processing of data within either of these two processes during its I/O calls.

There are buffering filters that attempt to lessen this effect by using shared memory and 2 processes: one to read and the other to write.

Under this model, you'll have to reanalyze your options to determine the bottleneck and actual system operational ordering.

mdpc
  • 11,856
  • 28
  • 53
  • 67