1

On copying directory "home" (from remote machine) to local machine using the following command:

ssh root@remote.machine 'tar -cz -C /home/*' | tar -zxv

The number of files after the transaction doesn't match. Some files were never copied.

Anyone has experienced similar problems?

Frankie
  • 429
  • 1
  • 6
  • 20
  • How are you counting the number of files on both sides? Do the missing files share any characteristics? If you run the source tar in verbose mode, do you see it processing the files? – larsks Jun 21 '11 at 15:47
  • @larsks `find . | wc -l`, quite strange. Did it twice and twice it missed some files. My fast solution was to `rsync` and both dirs got perfect. But as I couldn't explain it came here looking for a possible reason... – Frankie Jun 21 '11 at 15:59
  • Just a guess: Are there links inside the /home source folder structure? – desasteralex Jun 21 '11 at 16:03
  • 2
    Do find on both machines, redirect output to files, then diff the files. You will see what's different and maybe this will help you to find the reason why. – danadam Jun 21 '11 at 16:25
  • 1
    The first thing that comes to mind is hard links whose path exceeds 100 bytes (i.e. when tar encounters an inode for the second time or more and the current path is >100 characters long), which aren't supported in the usual tar format. IIRC GNU tar issues a warning but still returns 0. But really, figure out what files are missing and then describe the missing files if you can't find the pattern. – Gilles 'SO- stop being evil' Jun 21 '11 at 17:54
  • @Gilles you're answer is a clear winner. Please make it into a response so that I can accept. Thanks. – Frankie Jun 21 '11 at 22:41

2 Answers2

3

Use rsync instead. It's faster and safety.

rsync -avuz root@remote.machine:/home/ /srv/backups/home/

Then you can compress the data.

tar cvzf /srv/backups/home_`date +%F`.tgz -C /srv/backups home
ghm1014
  • 944
  • 1
  • 5
  • 14
  • thanks for your tip but were you to read the comments you would see that I use rsync... after! :) With the above command I'm passing all the files as one single stream of data making the transport of about 500.000 small files on a very fast network into a single one. On this particular case rsync adds a huge overhead making the transfer more than 6x slower (tested). – Frankie Jun 21 '11 at 22:38
  • The above command only tars the data during the transport... the tar is not saved on the server (good for when you don't have space) and on the client the file already appear decompressed. – Frankie Jun 21 '11 at 22:40
1

One issue with using tar for copying files is that the old POSIX tar format (ustar) has a limited length of 100 bytes to store hard links. It can cope with longer names, so as long as your files have a single link, everything is fine. But when tar encounters an inode for the second time, it produces a hard link record, with only 100 bytes for the name. If the name is too long, the second link isn't stored in the archive.

I do recall tar implementations that discarded these links with a diagnostic messages but still exited with a status of 0. Maybe your tar is even worse and silently discards them.

The new POSIX tar format (pax) doesn't have this limitation. Try using pax instead of tar, or tar with the right options. Current versions of GNU tar default to the pax format, and do complain properly if told to produce a ustar archive where the names don't fit.