11

I'm backing up a Linux box over SMB to a NAS. I mount the NAS locally and then I rsync a lot of data (100GB or so). I believe it's taking an awfully long time to do it: more than 12 hours. I would expected to be much faster once everything is copied since almost nothing is changed from day to day.

Is there a way to speed this up?

I was thinking that maybe rsync thinks it's working with local hard disks and uses checksum instead of time/size comparisons? But I didn't find a way to force time and date comparisons. Anything else I could check?

Pablo Fernandez
  • 7,438
  • 25
  • 71
  • 83
  • I'd also suggest looking at NFS instead fo SMB - I've noticed (and maybe it's just me) that it's faster tha Samba – warren Sep 24 '09 at 07:35
  • Unfortunately, this NAS doesn't have NFS and for now, I'm stuck with it. – Pablo Fernandez Sep 24 '09 at 08:42
  • Check the NAS's capabilities using a port mapper, like nmap. I've run into several NAS units that ran a native rsync service, even though there was no mention in the documentation, and no mention in the config. – Kyle__ Aug 16 '11 at 17:36
  • Please also check this thread ["rsync to NAS copies everything every time"][1] [1]: http://serverfault.com/questions/262411/rsync-to-nas-copies-everything-every-time/262424#262424 – dtoubelis Aug 16 '11 at 18:17
  • Please also check this thread [rsync to NAS copies everything every time][1] [1]: http://serverfault.com/questions/262411/rsync-to-nas-copies-everything-every-time/262424#262424 – dtoubelis Aug 16 '11 at 18:18

7 Answers7

34

I think you're having a misunderstanding of the rsync algorithm and how the tool should be applied.

Rsync's performance advantage comes from doing delta transfers-- that is, moving only the changed bits in a file. In order to determine the changed bits, the file has to be read by the source and destination hosts and block checksums compared to determine which bits changed. This is the "magic" part of rsync-- the rsync algorithm itself.

When you're mounting the destination volume with SMB and using rsync to copy files from what Linux "sees" as a local source and a local destination (both mounted on that machine), most modern rsync versions switch to 'whole file' copy mode, and switch off the delta copy algorithm. This is a "win" because, with the delta-copy algorithm on, rsync would read the entire destination file (over the wire from the NAS) in order to determine what bits of the file have changed.

The "right way" to use rsync is to run the rsync server on one machine and the rsync client on the other. Each machine will read files from its own local storage (which should be very fast), agree on what bits of the files have changed, and only transfer those bits. They way you're using rsync amounts of a trumped-up 'cp'. You could accomplish the same thing with 'cp' and it would probably be faster.

If your NAS device supports running an rsync server (or client) then you're in business. If you're just going to mount it on the source machine via SMB then you might as well just use 'cp' to copy the files.

Evan Anderson
  • 141,881
  • 20
  • 196
  • 331
  • 8
    Ooo! Downvotes! I'd be curious to hear why you downvoted the answer, considering it's technically accurate. – Evan Anderson Sep 24 '09 at 08:44
  • I can't run rsync server on the NAS, otherwise I would be doing so. When not using an rsync server, rsync can use the checksum or the size and datetime to find out whether a file changed or not. According to the man page, it'll use the size and datetime by default, but my experience is that it is not doing that and I don't see a way to force it. I only see a way to force checksumming. --checksum: Without this option, rsync uses a "quick check" that (by default) checks if each file's size and time of last modification match between the sender and receiver. – Pablo Fernandez Sep 24 '09 at 08:46
  • Evan, give me a couple of minutes to write my comment. – Pablo Fernandez Sep 24 '09 at 08:47
  • 2
    What behaviour are you seeing that's telling you that it's checksumming the files? The "quick check" behaviour is the default behaviour, so there's no way to "force" it. If you can't run rsync on the NAS just use 'cp'. It'll be as fast or faster. – Evan Anderson Sep 24 '09 at 08:52
  • According to how I understand rsync work, it should check the local date and time, the remote date and time and if they match not copy the file. Which means it shouldn't copy 99% of the files, but the fact that it takes more than 12hs for 60GB or so tells me that is either copying everything (which seems to be what you are implying by saying that cp will be faster) or that it is actually checksumming, which means it's not copying everything, but it is downloading everything. – Pablo Fernandez Sep 24 '09 at 08:59
  • I'd run it with the "--dry-run" and "--verbose" arguments to see what it thinks it's doing. I wonder if your NAS device isn't representing the modification times exactly the same as the source. You could add a "--size-only" argument and see if that changes things. What filesystem are you running on the NAS device? – Evan Anderson Sep 24 '09 at 09:07
  • Thanks Evan, I'll try those recommendations. Regarding NAS' FS, I'm not sure, but I would guess it's ext3. – Pablo Fernandez Sep 24 '09 at 09:09
  • @Evan Anderson: He's locally mounting the SMB share. According to the rsync docs, copies to and from a local path doesn't use the delta transfers but instead copies the whole file. That coupled with the fact that rsync is less efficient than cp results in slow transfers. – Starfish Aug 16 '11 at 17:34
  • @Starfish: That's what I say in my third paragraph. It switches to whole copy mode and doesn't do delta transfers in that situation. – Evan Anderson Aug 16 '11 at 18:54
7

It sounds like timestamps are your problem, as this page relates:

http://www.goodjobsucking.com/?p=16

The proposed solution is to add

--modify-window=1

to the rsync parameters.

Bob
  • 403
  • 4
  • 8
6

Yes, you can speed it up. You need to make either the source or destination look like a remote machine, say by addressing it as "localhost:".

You stated that you are mounting the SMB share locally. This makes the source or destination look like a local path to rsync. The rsync man page states that copies where the source and destination are local paths will copy the whole file. This is stated in the paragraph for the "--whole-file" option in the man page. Therefore, the delta algorithm isn't used. Using the "localhost:" workaround will restore the delta algorithm functionality and will speed up transfers.

Starfish
  • 2,735
  • 25
  • 28
  • 1
    I wonder what sense that should make... `rsync` uses time and date stamp to check whether or not a file needs to be updated. If it needs an update, then rsync will divide the file in chunks and compare the checksums. That means it will read the whole file to be able to do that. So if you do not have a rsync daemon running remotely you will need to transfer the whole file anyway to do the chunking and checksumming so you can just as well transfer it straight away. So the "workaround" outlined here buys you in fact nothing in this scenario. – TylerDurden Oct 01 '19 at 12:51
  • `-W // --whole-file` was the key in my case (over local network). I went from ~3.5 MB/s to ~35 MB/s. a 10x factor!! – logoff Jan 09 '21 at 12:21
3

Thought I would throw my 2p in here.

My brother has just installed a Buffalo NAS on his office network. He's now looking at off-site backups, so that should the office burn down, at least he still has all his business documents elsewhere (many hundreds of miles away).

My first hurdle was to get the VPS he has (a small Linux virtual private server, nothing too beefy) to dial-in as a VPN user to his broadband router (he's using a DrayTek for this) so that it itself can be part of his VPN, and so it can then can access the NAS directly, in a secure fashion. Got that sorted and working brilliantly.

The next problem was then transferring the files from the NAS to the VPS server. I started off by doing a Samba mount and ran into exactly the same (or even worse) issue that you've described. I did a dry-run rsync and it took over 1 hour 30 mins just to work out what files it was going to transfer, because as Evan says, under this method, the other end isn't rsync so it has to do many filing system calls/reads on the Samba mount (across a PPTP/tunnelled connection, with a round trip time of about 40ms). Completely unworkable.

Little did I know that the Buffalo actually runs an rsync daemon so, using that instead, the entire dry-run takes only 1 minute 30 seconds for 87k files totalling 50Gb. Obviously, to transfer 50Gb of files (from a NAS that is on a broadband link with only 100k/sec outbound bandwidth) is another matter entirely (this will take several days) but, once the initial rsync is complete, any incremental backups should be grease lightening (his data is not going to change much on a daily basis).

My suggestion is use a decent NAS, that supports rsync, for the reasons Evan has said above. It will solve all your problems.

parkamark
  • 1,128
  • 7
  • 11
0

There are two potential sources of the problem - either you use incorrect comman line options or your NAS has issues with timestamping (or both :-). Please check this thread "rsync to NAS copies everything every time" for more info.

dtoubelis
  • 4,677
  • 1
  • 29
  • 32
0

Smells like you have a cheaper NAS. It could also be from your network bandwidth...

"Standard" consumer NAS are really weak when it comes to heavy IO which is what you are trying to do here. It could also be a cheap switch connecting your PC and your NAS that is not strong enough to handle all the packets correctly.

Antoine Benkemoun
  • 7,314
  • 3
  • 42
  • 60
0

try this it think aleast gives you 10% more what speed your getting http://www.thegeekstuff.com/2009/09/linux-remote-backup-using-rsnapshot-rsync-utility/

Rajat
  • 3,349
  • 22
  • 29