2

When the backup moves the files from one server to the other the results from df change every few seconds in an impossible manner. The source host is running rsync. On the destination host I'm running the following command every few seconds:

  echo `date` `df|grep md0`

Results are below:

Sat Jun 29 23:57:12 CEST 2013 /dev/md0 4326425568 579316100  3527339636 15% /MD0
Sat Jun 29 23:57:14 CEST 2013 /dev/md0 4326425568 852513700  3254142036 21% /MD0
Sat Jun 29 23:57:15 CEST 2013 /dev/md0 4326425568 969970340  3136685396 24% /MD0
Sat Jun 29 23:57:17 CEST 2013 /dev/md0 4326425568 1255222180 2851433556 31% /MD0
Sat Jun 29 23:57:20 CEST 2013 /dev/md0 4326425568 1276006720 2830649016 32% /MD0
Sat Jun 29 23:57:24 CEST 2013 /dev/md0 4326425568 1355440016 2751215720 34% /MD0
Sat Jun 29 23:57:26 CEST 2013 /dev/md0 4326425568 1425090960 2681564776 35% /MD0
Sat Jun 29 23:57:27 CEST 2013 /dev/md0 4326425568 1474601872 2632053864 36% /MD0
Sat Jun 29 23:57:28 CEST 2013 /dev/md0 4326425568 1493627384 2613028352 37% /MD0
Sat Jun 29 23:57:32 CEST 2013 /dev/md0 4326425568 615934400  3490721336 15% /MD0
Sat Jun 29 23:57:33 CEST 2013 /dev/md0 4326425568 636071360  3470584376 16% /MD0

As you can see I start from USE of 15% and after 15 seconds I'm at 37% (I don't need to mention that the backup can not copy this huge amount of data in such a short time). After ~20 seconds the cycle closes. I'm again roughly at the same usage as earlier. The value is reasonable, ca. 35 Mb were copied.

Can somebody explain to me what is going on? Does df only make an estimation of usage instead of used value?

MadHatter
  • 79,770
  • 20
  • 184
  • 232
tatus2
  • 139
  • 4
  • What's your rsync command look like? – dialt0ne Jun 30 '13 at 02:47
  • Did you try to find the used space with other tools (du) ? – Kris_R Jun 30 '13 at 06:05
  • @dialt0ne `rsync --progress --delete-excluded --exclude-from="exculde-file.txt" -avhe ssh /MD0 192.168.178.81:/MD0/`; excluded defines usual temporary files (~* ._* *.tmp) – tatus2 Jun 30 '13 at 07:00
  • I think rsync creates temporary files while the file is being transferred. If you have some large individual files during transfer the temp. files will take up space, they will be removed once transfer of that file is complete - lots of space free again all of a sudden. – Mörre Jun 30 '13 at 07:27
  • @Kris_R `du -s /MD0` gave 948474832. At the same time `df` used=1571381664 – tatus2 Jun 30 '13 at 09:37

1 Answers1

3

Because rsync copies to temporary files and then replaces the old file. Also, rsync only transfers diffs, not the whole file.

Therefore, if you have a 20G file of which 10M has changed, rsync will first duplicate the 20G file on the target system into a temp directory, then transfer the 10M diff and apply it to the copy. Finally it replaces the old file with the new.

This is to prevent corruption of the file in the event a partial transfer fails.

To avoid creating temporary files and update files in place use the --inplace flag.

bahamat
  • 6,263
  • 24
  • 28
  • Sorry, but it can't be. 1st of all the target was empty before I started so nothing could be replaced. 2nd - no file is bigger than ~1--2Gb; 3rd - there is no way that in 20s I could transfer 20% of my drive capacity - it's 4Tb drive so we talk about 800Gb in 20s! I wish it could be true that my hardware is so fast. – tatus2 Jun 30 '13 at 09:34
  • Try an experiment. Use `--partial` and kill the transfer when disk usage is high. Then use jdiskreport to find large files. See what it is for yourself. – bahamat Jun 30 '13 at 10:17
  • OK, I'm not happy about breaking the transfer but I'm too curious. Anyway `rsync` will start where it ended. – tatus2 Jun 30 '13 at 11:01
  • OK I did it. I tried both things. Once `rsync` with `--inplace` -I observed exactly the same behavior as before (i.e. cycling value up to +20%). Then I tried with `--partial`. When I was by _ca._ +18% I hit control-break. The incomplete transferred file had only ~0.5Gb what is 0.01% of disk capacity. A I stated before, I can't believe that temporary file can allocate 20% of 4Tb _i.e._ 800Gb! – tatus2 Jun 30 '13 at 14:05
  • Well, what *was* taking up all that space? – bahamat Jul 01 '13 at 00:23
  • Nothing? `df` strange algorithm? That's my question/problem. If I try to find the usage out with `du` it's quite reasonable. But, of course, `du` needs quite a time to list 2Tb. – Kris_R Jul 01 '13 at 13:59