3

We have a scheduled task that runs every night and copies a file of around 70-80gb from one server to another on our network. For some reason it has been taking ~8 hours to do this, which is a problem because it doesn't finish before our nightly backup tape operation runs and this file doesn't make it to the tape.

Any suggestions to make this run quicker?

Here's the batch file

if not exist g:\corp-prod-02\ihub\ihub.bkp goto backup
del /Q g:\Corp-prod-02\ihub\old\ihub.bkp
move g:\Corp-prod-02\ihub\ihub.bkp g:\corp-prod-02\ihub\old
:backup
call probkup online D:\ihubdb\live-new\ihub D:\ihubdb\ihub.bkp
robocopy D:\ihubdb G:\corp-prod-02\ihub ihub.bkp /Z /MOV /LOG:c:\scripts\logs\ihub.log
copy c:\scripts\logs\ihub.log g:\corp-prod-02\ihub
Zoredache
  • 130,897
  • 41
  • 276
  • 420
Mike
  • 165
  • 1
  • 4
  • 15
  • batch file formatting got messed up, sorry about that – Mike Feb 24 '12 at 20:55
  • 2
    I suggest you use a tool like iperf and see what your maximum transfer rate is between the two systems. If you only had 10mb, connectivity, then it wouldn't be surprising that it isn't completing. Then double check the speed you can read/write to the drives each side. Once you know your transport speed and the actual drive speeds figure out if robocopy is being unreasonable. – Zoredache Feb 24 '12 at 20:58
  • ahh you reminded me of something I forgot to mention, we have another scheduled task that runs 5 minutes earlier, and copies a different file between the two servers. This file is only about 1 gb yet it transfers at 40mb/sec while the one in question transfers at 2.75mb/sec – Mike Feb 24 '12 at 21:01
  • Out of curiosity, what OS is running on each end? Also, does your robocopy log file indicate any failures followed by retries? Default robocopy behaviour is to wait 30 seconds b/w retries, which certainly wouldn't help your duration. – JamesCW Feb 24 '12 at 21:08
  • That is concerning. Is there some reason why you are using the `/z` switch? There seems to be many hits on the Interwebs about drastically slowing things down. – Zoredache Feb 24 '12 at 21:09
  • The files are going from Server 2003 x64 over to Server 2003 x86, the robocopy log shows no errors, just the percentages increasing very slowly (for example, many 0.0% entries until you get a 0.1% and so on and so on until 100%) – Mike Feb 24 '12 at 21:10
  • we use the /z switch because from my understanding that ensures the copy will work in the event of network interference and this is a pretty important file so we would like it transferred successfully each night, I suppose it wouldn't be impossible for us to try out a few nights without that switch though – Mike Feb 24 '12 at 21:14
  • A quick google suggests /z will slow it down by around 6.5x. – Robin Gill Feb 24 '12 at 21:18
  • I suspect does the restartable stuff by performing lots of checksums while the transfer is taking place. This almost certainly costs you in performance. While I can understand the desire to get a good accurate copy, I would be tempted to try it without that switch. I might also be tempted to use something like rsync instead with a deamon running on the server, so that the server can do the checksum calculations. – Zoredache Feb 24 '12 at 21:25
  • Thanks for all the suggestions, I will remove /Z for tonight and see if that makes an impact – Mike Feb 24 '12 at 21:37
  • It turns out that /Z was causing the problems. Without that switch the file copied over in slightly under an hour, compared to about 8 hrs with the switch. Now the question is, is there a similar feature of robocopy that I could use, or another copy program perhaps? Because we would like that restartable feature if possible. – Mike Feb 27 '12 at 20:33

1 Answers1

0

My first thought was that it was a network issue, but then your comment explaining that you don't have any issues with smaller files, reminded me of a problem I've seen in the past when transferring large files around. It took me a while to work out what was going on, but I ended up tracing it to an exhaustion of the kernel's non paged memory pool.

It might be worth reading these articles and using poolmon.exe (specifically the MmSt pool tag) to see if you are experiencing the same issue.

Edit:

This article is aimed at NT4 and Windows 2000, but probably still relevant.

Bryan
  • 7,628
  • 15
  • 69
  • 94