Source file size increase during rsync

Question

I backup a directory with rsync. I looked at the directory size before I started the rsync with du -s, which reported a directory size of ~1TB.

Then I started the rsync and during the sync I looked at the size of the backup directory to get an estimated end time. When the backup grew much larger than 1TB I got curious. It seems that the size of many files in the source directory increases. I did an du -s on a file in the source before and after the rsync process copied that file:

## du on source file **before** it was rsynced
# du -s file.dat
2 file.dat

## du on source file **after** it was rsynced
# du -s file.dat
4096 file.dat 
```

The rsync command:

rsync -av -s --relative --stats --human-readable --delete --log-file someDir/rsync.log sourceDir destinationDir/

The file system on both sides (source, destination) is BeeGFS 6.16 on RHEL 7.4, kernel 3.10.0-693

Any ideas what is happening here?

Could you clarify which file is which? First one is source file and second is rsynced one? — DevilaN, Nov 01 '18 at 19:23
I clarified it in the code section and added also the rsync command — Fex, Nov 01 '18 at 19:27
Cannot reproduce this behaviour in my system. It could be filesystem / kernel dependant as 4096 will be block size for your filesystem. Could you also add info about your system and filesystems used on partitions where both source and destination directory resides? — DevilaN, Nov 01 '18 at 19:36
Added some information about file system, distribution and kernel. — Fex, Nov 02 '18 at 09:26

score 1 · Answer 1 · answered Nov 01 '18 at 20:01

file.dat is maybe a sparse file. Use option --sparse :

   -S, --sparse
          Try  to  handle  sparse  files  efficiently so they take up less
          space on the destination.  Conflicts with --inplace because it’s
          not possible to overwrite data in a sparse fashion.

Wikipedia about sparse files:

a sparse file is a type of computer file that attempts to use file system space more efficiently when the file itself is partially empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual "empty" space which makes up the block, using less disk space.

A sparse file can be created as follows:

$ dd if=/dev/zero of=file.dat bs=1 count=0 seek=1M

Now let's examine and copy it:

$ ls -l file.dat
.... 1048576 Nov  1 20:59 file.dat
$ rsync file.dat file.dat.rs1
$ rsync --sparse file.dat file.dat.rs2
$ du -sh  file.dat*
0       file.dat
1.0M    file.dat.rs1
0       file.dat.rs2

I read about sparse files. In your (good) explanation the sparse file is bigger on the destination. In my case the file size in the source directory grows. — Fex, Nov 02 '18 at 09:29

Source file size increase during rsync

1 Answers1