3

On my SQL Server instance I use SQL Agent to run a daily backup job in two steps.

One of the steps uses Robocopy to move local backup files to network storage.

The command to move the files looks like this:

robocopy M:\backups \\NAS\backups$\Database /MOV /MIR /XJ /NP /COPY:DT

The output of the command in the job step history looks like this:

-------------------------------------------------------------------------------     
ROBOCOPY     ::     Robust File Copy for Windows                                
-------------------------------------------------------------------------------      
Started : Fri Jul 20 00:55:42 2012     
Source : M:\backups\       
Dest : \\NAS\backups$\Database        
Files : *.*            
Options : *.* /S /E /COPY:DT /MOV /PURGE /MIR /NP /XJ /R:1000000 /W:30     
------------------------------------------------------------------------------  
3  M:\backups\
      *EXTRA File       15.5 m  GeoDisplay_Full_2012-07-19-000004.bak
      *EXTRA File       41.3 m  GeoDisplay2_Full_2012-07-19-000004.bak
      *EXTRA File      264.1 g  Webstore_Full_2012-07-19-000004.bak
      New File          15.5 m  GeoDisplay_Full_2012-07-20-000002.bak
      New File          41.4 m  GeoStore_Full_2012-07-20-000002.bak
      New File         302.1 g  Webstore_Full_2012-07-20-000002.bak

      2012/07/20 04:34:50 ERROR 32 (0x00000020) Deleting Source File M:\backups\Webstore_Full_2012-07-20-000002.bak  The process cannot access the file because it is being used by another process.       
------------------------------------------------------------------------------
      Total    Copied   Skipped  Mismatch    FAILED    Extras
Dirs :         1         0         1         0         0         0     
Files :         3         3         0         0         0         3     
Bytes : 302.187 g 302.187 g         0         0         0 264.181 g     
Times :   3:38:57   3:38:45                       0:00:00   0:00:11         
Speed :            24720063 Bytes/sec.     
Speed :            1414.493 MegaBytes/min.       
Ended : Fri Jul 20 04:34:50 2012.  
Process exit code 3.

The text output clearly shows ERROR 32. Robocopy failed to honor the /MOV switch (delete from source after copy) because another process had a handle on one file when robocopy tried to delete it.

Robocopy returned exit code 3 (new files copied to destination and extra files deleted from destination). This is correct but incomplete because there is no way to tell from the error code that any operation failed.

SQL Agent considers only the return code, not the command output, when determining the success of a command shell operation. I could modify the Robocopy command to save the output to disk, and parse the output in an extra job step, but this requires extra programming and would add another dependency to the backup job.

Is there any way to detect this kind of failure without searching Robocopy's text output for the string ERROR 32?

Iain Samuel McLean Elder
  • 1,232
  • 4
  • 15
  • 27
  • This evil kind of error output makes robocopy's claim to robustness a little shaky. Are there any more-robust options available, preferably a utility that ships with Windows? – Iain Samuel McLean Elder Jul 20 '12 at 14:35
  • What method other than searching the text output (log) are you looking for to detect an error? My brain keeps going back to how else you'd detect an error if you didn't check the log somehow... but I've been off my game this week, so that could be it too. – HopelessN00b Jul 20 '12 at 15:09
  • 1
    I would think twice using something like Robocopy alone to rely on for backups shipping. – Sergei Jul 20 '12 at 16:09
  • @Sergei What tool would you use instead? Would you use Robocopy and something else? – Iain Samuel McLean Elder Jul 21 '12 at 19:45
  • @HopelessN00b SQL Agent looks at the exit code to work out whether the command was successful. Robocopy is capable of setting a failure bit on the failure of some operations, but doesn't set a failure bit when it fails to delete from the source. – Iain Samuel McLean Elder Jul 21 '12 at 20:05
  • @isme Gotcha, I understand now. Might be a good idea to point that out in the question too, but I'm not sure if it'll do any good... if Robocopy fails to delivery proper/expected output, the only solutions I can think of are not using it (like Sergei suggested, probably not a good idea to rely on it for backups anyway), or writing a script that goes in after Robocpy to make sure it did things right. (Either checking against the file system or parsing the log for `ERROR 32`.) Doesn't seem too appealing to me, so I'd be tempted to tell the bossman he needs to buy backup software for backups. – HopelessN00b Jul 21 '12 at 20:40
  • What is the version of robocopy.exe? – Greg Askew Jul 22 '12 at 14:13
  • @GregAskew It's version XP027. – Iain Samuel McLean Elder Jul 25 '12 at 13:54

2 Answers2

5

Whether it's RoboCopy or some other method, I prefer to log all output and then post-process the log. I personally prefer Perl for that task but use whatever you're comfortable with. I aslo like to have the test checking script email me the results, showing just whether it succeeded or in the case of a failure the relevant lines from the log.

In my opinion, any backup operation that does not include reporting the results is a disaster waiting to happen, simply because you cannot possibly have confidence in an operation that is not checked. Humans do an appallingly poor job of checking logs, so make the extra effort to write the script. Without it you can just about be guaranteed that some critical operation will fail one day and you'll be none the wiser for it, possibly risking far more than just the data.

John Gardeniers
  • 27,458
  • 12
  • 55
  • 109
  • Thanks, John. If we were to continue using Robocopy, we would process the logs. We thought about the effort involved in implementing a log-processing solution versus simplifying our system architecture. In the end we mapped the network storage as a local drive so that we use only SQL Server's built-in functionality. We already have log-processing solutions for SQL Server, so we have reduced the risk of missing something. – Iain Samuel McLean Elder Sep 30 '12 at 13:38
  • I consider this a bug in Robocopy and have reported it here: https://github.com/MicrosoftDocs/windowsserverdocs.de-de/issues/49 – Marco Eckstein Apr 20 '21 at 23:21
4

The exit code description "new files copied to destination and extra files deleted from destination" is not technically accurate. You should not rely only on the description.

The exit code 0x3 is a bit flag. The friendly message should be interpreted as follows:

"One of more files were copied successfully"

PLUS

"Extra files or directories were detected. Examine the log file for more information."

Code    Meaning
0   No errors occurred and no files were copied.
1   One of more files were copied successfully.
2   Extra files or directories were detected.  Examine the log file for more information.
4   Mismatched files or directories were detected.  Examine the log file for more information.
8   Some files or directories could not be copied and the retry limit was exceeded.
16  Robocopy did not copy any files.  Check the command line parameters and verify that Robocopy has enough rights to write to the destination folder.

Robocopy exit codes
https://blogs.technet.com/b/deploymentguys/archive/2008/06/16/robocopy-exit-codes.aspx

The upshot is John is correct. There's no way around examining the logs if you want to have a robust backup operation. Additionally, you may want to have a script that runs after the Robocopy operation that removes the handle to the dangling files, then removes the files.

Greg Askew
  • 35,880
  • 5
  • 54
  • 82