8

CentOS 5.x

Mq question seemed similar to this one but I wasn't sure...

I have two servers (completely isolated from each other), each with a directory and sub-directories that should have the same exact contents.

For example the directory layout could be something like:

SERVER A -

/opt/foo/foob/1092380298309128301283/123.txt
/opt/foo/foob/5094380298309128301283/456.txt
/opt/foo/foob/5092380298309128301283/789.txt
/opt/foo/foob/1592380298309128301283/abc.txt

SERVER B -

/opt/foo/foob/1092380298309128301283/123.txt
/opt/foo/foob/5094380298309128301283/456.txt
/opt/foo/foob/5092380298309128301283/789.txt
/opt/foo/foob/1592380298309128301283/abc.txt

Ideally I'd like a way to do a recursive check and have something confirm that everything matches.

I also want to avoid using any third-party tools.

Any ideas?

Mike B
  • 11,871
  • 42
  • 107
  • 168
  • Are you just wanting to compare the two directories, or actually make one a duplicate of the other? – Scott Pack May 18 '12 at 20:45
  • @ScottPack Great question. I want to compare but NOT make any changes. Something else is handling the replication of the directories. I just want to make sure it's doing its job. – Mike B May 18 '12 at 20:46
  • 1
    You already tagged this question `rsync`? So... uhm, use `rsync` (with `-n` option)? – faker May 18 '12 at 20:48
  • @faker I thought rsync might be the option but wasn't sure if there was something better, easier, or more specific to this use case. I need to know that the filenames, date, size, and relative location match. – Mike B May 18 '12 at 20:49
  • note that rsync doesn't check file contents if time and size match, see the --checksum option if this worries you – stew May 18 '12 at 20:52

4 Answers4

9

One good way is to use md5sums on every file in the tree:

Run this on server1:

find /opt/foo/foob/ -type f -print0 | xargs -0 md5sum > report_from_server1.tx

Run this on server2

find /opt/foo/foob/ -type f -print0 | xargs -0 md5sum > report_from_server2.tx

Then just compare the two files (using diff) or whatever you like.

Is that along the lines of what you're looking for?

Of course, you can use SSH to just execute the command remotely if you want.

Camden S.
  • 230
  • 2
  • 7
  • Thanks Camden. Yea, I think this is what I was looking for. I'll test it and see if it works out. – Mike B May 19 '12 at 03:05
  • 2
    Or md5sum the md5sums – dmourati May 19 '12 at 03:21
  • Excellent - right, if you don't care to know which files changes, you could just md5sum the resulting files and compare those two sums. – Camden S. May 19 '12 at 05:04
  • Also, you probably already know this, but to execute that command remotely over SSH, you'd just do `ssh user@servera 'find /opt/foo/foob/ -type f -print0 | xargs -0 md5sum'` – Camden S. May 19 '12 at 05:04
  • I like @dmourati's idea. In theory, could i do something like `ls -lR ./opt/foo/foob/ | md5sum` to quickly compare the directories? – Mike B May 21 '12 at 19:13
  • 1
    MikeB, by executing recursive long listing and passing that to md5sum, you'll be getting an md5sum of the directory listing, which'll exclude the content of files. If inode sizes on filesystems on either end were different then it could very well create a difference in file sizes too. Doing an md5sum on the content like originally suggested by @CamdenS. is better. – nearora May 31 '12 at 22:31
7

If you don't necessarily care about what changed, just that something has changed, rsync is still really good for that. Try running this command and take a gander at the output, assuming this is run from 'servera'.

rsync -avcn /opt/foo/ serverb:/opt/foo

The resulting list will be those files that would have been modified if you actually ran the sync process. Keeping in mind that the files will show up in the list even if only the timestamp changed, but the contents remained the same. Since we added the -n flag, then no actions will actually be performed, only reported.

Scott Pack
  • 14,907
  • 10
  • 53
  • 83
  • Thanks. What if the two boxes are completely isolated from one another? How can I use the output to compare? – Mike B May 18 '12 at 21:07
  • rsync doesn't support both source and destination to be remote, so he'll need to run it off one of his servers – faker May 18 '12 at 21:10
  • @faker: Have to admit, never tried that before, good to know. As you say, though, it is easy enough to account for. – Scott Pack May 19 '12 at 01:51
  • +1. Clever use of `rsync`. To be completely correct, though, you need to run the `rsync` in both directions. That is, you need to add this: `rsync -avcn serverb:/opt/foo/ /opt/foo` – Steven Monday May 19 '12 at 03:09
5

While you could hack together a quick script that will calculate individual MD5 hashes for individual files in a directory, the better way to do it would be to use a tool called md5deep which will recursively calculate the hashes of all files in a directory, and then output them to a file. It can then be used on another directory, taking the first hash file as an input, and providing you with a list of files that are different between the two directories.

So, taking your example, you would follow this process:

  1. Calculate hashes of the required directory on Server A:

    md5deep -r /opt/foo/ > file_hashes.txt

  2. Copy the file file_hashes.txt file onto Server B for comparison.

  3. Calculate hashes of the required directory on Server B, but taking the file hashes from Server A as an input file by using the -x flag to only show files that are different:

    md5deep -x file_hashes.txt -r /opt/foo/

The md5deep set of tools forms part of the package management system of most distros, and the great thing is that it supports a number of different hashing algorithms, not just MD5. So if you're paranoid about collisions, you have a number of alternatives available. The following tools form part of md5deep, each providing an alternative hashing algorithm:

   md5deep - Compute and compare MD5 message digests
   sha1deep - Compute and compare SHA-1 message digests
   sha256deep - Compute and compare SHA-256 message digests
   tigerdeep - Compute and compare Tiger message digests
   whirlpooldeep - Compute and compare Whirlpool message digests
Richard Keller
  • 2,040
  • 2
  • 19
  • 31
0

I used a technique similar to @scott-pack This will tell give you two-way diffing. Everything that starts with "deleting" is a file that is on the remote server but not the local server. Every directory listed without any file contents is one that has no changes. Every file that is listed is a file that either doesn't exist on the remote server, or it the local version is "newer".

rsync -rvnac --delete /local/directory/ user@remote:/remote/directory/
David Baucum
  • 101
  • 1