Why does tar -c dir | pigz > /mnt/nfs/dir.tgz use network, then cpu in cycles instead of both at once (with one bottlenecking)

Question

I want to transfer a multi-terabyte directory to an nfs mounted directory most efficiently over a 1Gbit network (probably the limiting factor)

3 options -

tar and compress in place, then copy
copy, then tar and compress
tar | compress

Seems obvious to me that #3 should be most efficient since I'm only reading and writing the data one time. Unfortunately, my command (tar -c dir | pigz > /mnt/nfs/dir.tgz ) seems to tar for a while, then zip for a while, then tar for a while... and the network goes idle for large chunks of time, then the cpu is idle.

Did i miss some option?

P.S. My question seems related to this question but that has no answer and doesn't really ask the precise question about alternation between network and cpu saturation.

Did you try with classic "gzip"? May the bottleneck be disk reads? — LatinSuD, Oct 13 '11 at 21:03

mdpc · Accepted Answer · 2011-10-13T20:29:39.423

1

You might be forgetting the fact that in UNIX/Linux a process can only do only a single BLOCKING I/O operation at a time. There is no concurrent read or write operations contained in either the tar or the compress functions. Nor is there any processing of data within either of these two processes during its I/O calls.

There are buffering filters that attempt to lessen this effect by using shared memory and 2 processes: one to read and the other to write.

Under this model, you'll have to reanalyze your options to determine the bottleneck and actual system operational ordering.

edited Oct 13 '11 at 20:29

answered Oct 13 '11 at 19:58

mdpc

11,856
28
53
67

You can use a buffer like [`pv`](http://www.ivarch.com/programs/pv.shtml). It's not clear whether it would need to be between `tar` and `pigz` or between `pigz` and the file. – David Schwartz Oct 13 '11 at 20:38
There also is a Linux program called buffer that I have extensively used. This program is easily available on distributions like Fedora, or Suse. – mdpc Oct 13 '11 at 21:50
thanks... the use of mbuffer moved improved my throughput from about 60Mbit/s to more than 100Mbit/s and pegs cpus most of the time now. – Brad Langhorst Oct 14 '11 at 02:21
Where did you stick mbuffer? – sciurus Oct 14 '11 at 03:13
i put it between the tar and the compressor. – Brad Langhorst Oct 14 '11 at 04:35
You might also want to add a buffer after the compressor too. – mdpc Oct 14 '11 at 17:49

Why does tar -c dir | pigz > /mnt/nfs/dir.tgz use network, then cpu in cycles instead of both at once (with one bottlenecking)

1 Answers1