10

I'm trying to use md5sum to compare two files in a bash script.

The goal is to use the .md5 of one file to check the md5sum of the other file. My Google searches on how to do this the proper way isn't showing me how I'm doing this. Firing off an e-mail works as you'd expect. Now I'm trying to get it to fire off an e-mail on failure rather than success.

And maybe list the result of what was received from the .md5 file and the actual md5sum of the corrupted file. I'll figure this out, eventually but this is somewhat confusing since I have tried to figure out where I'm going wrong here.

Shellcheck indicates that the code looks good, but I'm not getting the results that I'm expecting to get.

A few StackOverflow links that I checked out to see if something could be worked:

One

Two

Here's the content of my bash script, in its original form:

#!/bin/bash
cd /home/example/public_html/exampledomain.com/billing/system/ || exit
rm -rf GeoLiteCity.dat
curl -L https://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz | gunzip > GeoLiteCity.dat
curl -L https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz | gunzip > GeoLite2-City.dat
curl -L https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.md5
md5sum GeoLite2-City.dat > md5sum.txt

file1="md5sum.txt"
file2="GeoLite2-City.md5"

if [ "`cat $file1`" != "`cat $file2`" ]; then
mail -s "Results of GeoLite Updates" email@address.com <<< "md5sum for GeoLite2-City failed. Please check the md5sum. File may possibly be corrupted."
else
exit
fi

Edit:

Updated the code to the following:

#!/bin/bash
cd /home/example/web/exampledomain/public_html/billing/system/ || exit
rm -rf GeoLite*
rm -rf md5sum.txt
curl -L https://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz | gunzip > GeoLiteCity.dat
curl -L https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz | gunzip > GeoLite2-City.dat
wget https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.md5
md5sum GeoLite2-City.dat > md5sum.txt

file1="md5sum.txt"
file2="GeoLite2-City.md5"

if ! cmp "$file1" "$file2"; then echo "They don't match."; fi

Still working on this. Getting closer to actually making it work!

Results of the above:

root@example# cat GeoLite2-City.md5
e8c076d6ff83e9a615aedc7d5d1842d7
root@example# md5sum GeoLite2-City.dat
e8c076d6ff83e9a615aedc7d5d1842d7  GeoLite2-City.dat
root@example# cat md5sum.txt
e8c076d6ff83e9a615aedc7d5d1842d7  GeoLite2-City.dat

Edit2: Code is now as follows, also, note that I remove GeoLiteCity2 and GeoLite so that we start with a fresh download of the databases every time MaxMind updates their database:

#!/bin/bash

# cd to directory where the MaxMind database is to be downloaded.
if ! cd /home/example/public_html/billing/system/; then
echo "Can't find work directory" >&2
exit 1
fi

# Remove existing files so we start off with a clean set of updated data from Maxmind.

rm -f GeoLite*
rm -f md5sum.txt

# Download databases and if applicable, their md5s.

curl -L https://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz | gunzip > GeoLiteCity.dat
curl -L https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz | gunzip > GeoLite2-City.dat
curl -O https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.md5

# Create md5sum of the GeoLite2 database.
md5sum < GeoLite2-City.dat > md5sum.txt
# Strip out the spurious - seen in md5sum.txt
sed -i 's/ .*//' md5sum.txt

# Set what files are what for file comparison purposes.
file1="md5sum.txt"
file2="GeoLite2-City.md5"

# DO THE THING! ie, compare!
if ! cmp --silent "$file1" "$file2"; then
mail -s "Results of GeoLite Updates" example@domain.com <<< "md5sum for GeoLite2-City failed. Please check the md5sum. File may possibly be corrupted."
fi
Community
  • 1
  • 1
Keiro
  • 125
  • 1
  • 1
  • 12
  • Rather than launching all those sub-shells for your comparison, you could also just compare the files. Something like: `if ! cmp "$file1" "$file2"; then echo "md5sum mismatch on $file2" | mail -s "Results..." you@example.com; fi`. – ghoti Oct 10 '15 at 03:30
  • @ghoti So, using your suggestion, I get the following: cmp: EOF on GeoLite2-City.md5 They don't match. HOWEVER, I see why cmp is throwing the error. It's because md5sum.txt contains `example@example# cat md5sum.txt e8c076d6ff83e9a615aedc7d5d1842d7 GeoLite2-City.dat` How do fix? – Keiro Oct 10 '15 at 03:33
  • I don't know what's in the file. Is `GeoLite2-City.md5` empty? You'd get an error like that if the file was zero-length. – ghoti Oct 10 '15 at 03:33
  • @ghoti Heh, I just edited my comment, sorry. It's not empty, just outputs the MD5sum of GeoLite2-City.dat Edit: Also, pastebin ftw: https://pastebin.com/zjGfEejK – Keiro Oct 10 '15 at 03:35
  • Comments are a truly terrible place for multi-line code examples. Please [update your question](http://stackoverflow.com/posts/33049634/edit) with follow-up data. For example, the exact content of BOTH files that you're comparing. – ghoti Oct 10 '15 at 03:38
  • @ghoti Noted. Updating shortly. – Keiro Oct 10 '15 at 03:38
  • @Keiro updated my answer so you will get rid of `GeoLite2-City.dat` and compare md5 only – Samuel Oct 10 '15 at 03:45
  • Kiero, you're updating your question to incorporate things that have been suggested in answers. That's bad form, and makes things very confusing for people trying to learn from this Q&A in the future. I recommend that you leave the original version of your script, so that it's obvious what issues the answers below are actually addressing. Add sections, separate them with `----` and explain what experiments you've done, and their results. – ghoti Oct 10 '15 at 03:56

3 Answers3

4

So .. the problem you're seeing appears to be that the format of the md5sum.txt file you create doesn't match the format of the .md5 file that you download, against which you need to check the value that you calculate.

The following would be closer to my version of the script. (Explanation below.)

#!/bin/bash

if ! cd /home/example/public_html/exampledomain.com/billing/system/; then
  echo "Can't find work directory" >&2
  exit 1
fi

rm -f GeoLiteCity.dat

curl -L https://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz | gunzip > GeoLiteCity.dat
curl -L https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz | gunzip > GeoLite2-City.dat
curl -O https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.md5
md5sum < GeoLite2-City.dat | cut -d\  -f1 > md5sum.txt

file1="md5sum.txt"
file2="GeoLite2-City.md5"

if ! cmp --silent "$file1" "$file2"; then
  mail -s "Results of GeoLite Updates" email@address.com <<< "md5sum for GeoLite2-City failed. Please check the md5sum. File may possibly be corrupted."
fi

The major differences here are..

  • rm -f GeoLightCity.dat instead of -rf. Let's not reach farther than we need to.
  • md5sum takes standard input rather than processing the file by name. The effect is that the output does not include a filename. Unfortunately because of limitations to the Linux md5sum command, this still doesn't match the .md5 file you download from Maxmind, so:
  • cut is used to modify the resultant output, leaving only the calculated md5.
  • using cmp instead of subshells, per comments on your question.

The second and third points are perhaps the most important ones for you.

Another option for creating your md5sum.txt file would be to do it on-the-fly as you're download. For example:

curl -L https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz \
| gunzip | tee -a GeoLite2-City.dat | cut -d\  -f1 | md5sum > md5sum.txt

This uses the tee command to split the file into its "save" location and another pipe, which goes through md5sum to generate your .txt file.

Might save you a minute that would otherwise be eaten by the md5sum that runs afterwards. And it'll take better advantage of SMP. :)

ghoti
  • 45,319
  • 8
  • 65
  • 104
  • Aaaaaaah. Now I see where I've gone wrong here. Your explanation here, along with the example of the script makes things crystal clear. You're right, I did not realize that `md5sum` takes standard input, rather than processing the file by name. This is the critical error I made, as you correctly pointed out. `man md5sum` probably should've been done beforehand in writing this bash script... but what's the fun in that?! :P – Keiro Oct 10 '15 at 03:58
  • Your suggested script works. With just one issue. After I `cat md5sum.txt`, I get the following: `e8c076d6ff83e9a615aedc7d5d1842d7 -` ... is there a way to get just the md5sum, nothing else in that txt file? I decided to also use your suggestion re: `tee` – Keiro Oct 10 '15 at 04:06
  • Awesome, I'm glad I could help! One other thing I'll just mention is that `md5sum` is the name of the command in Linux, but OSX, FreeBSD, NetBSD, etc all use a similar tool called simply `md5`. If in the future you want to make your script portable, you can select a binary based on the OS you detect. `case \`uname -s\` in Darwin|*BSD) md5=md5;; Linux) md5=md5sum;; *) echo "Dunno.";; esac` – ghoti Oct 10 '15 at 04:07
  • As for the spurious ` - ` in your file ... hmm. I just checked an old Ubuntu box and realized that `md5sum` does not have an option to tell it to skip the filename. The dash is a common indicator for standard in. Looks like your solution will have to be to parse the file. I've updated my answer. – ghoti Oct 10 '15 at 04:13
  • That appears to have done the trick! So far, no failed md5sum comparison e-mails. Edit: Correction, still getting comparion error e-mails. :( /edit Though I now see `e8c076d6ff83e9a615aedc7d5d1842d7 root@example#` as output once the script's done running. I'm sure I'll figure out which line's throwing that into my shell and redirect it to `/dev/null`. Thanks for your help though! Your example is very clear and concise. I've also commented the script with your explanations. – Keiro Oct 10 '15 at 04:30
  • Happy to help. As for your extra md5 output, the command generating this is probably the third (final) `curl -L` line. I think you intended to *download* the server-side md5 file, but instead you're sending it to stdout. You might want to change that `-L` to `-O` to write the file using the remote name. (I've updated my answer with this change.) – ghoti Oct 10 '15 at 04:35
  • I realized this right before I saw your comment about `curl -L` on the third final line. You're right, I did intend to _download_ it. Now that I've done that, it all works properly! I will update OP with my final code edit. – Keiro Oct 10 '15 at 04:39
  • Hmm. Odd, it's still sending me md5 comparison failed e-mails... but the md5sums in both files are correct... so the script clearly works. – Keiro Oct 10 '15 at 04:47
4

For anyone coming here looking to compare a file to a specific md5 sum, you can try this function:

function checkmd5() {
  md5_to_test=$1
  md5_from_file=$(md5sum "$2" | cut -d " " -f1)
  md5_results="Input: $md5_to_test\nFile:  $md5_from_file"
  if [[ $md5_to_test == $md5_from_file ]]
    then
      echo -e "\n\e[92mSUCCESS\e[39m\n$md5_results"
    else
      echo -e "\n\e[91mFAILURE\e[39m\n$md5_results"
  fi
}

And then just use it like:

$ checkmd5 <SOME_MD5_SUM> filepath/file.abc
Jsilvermist
  • 491
  • 7
  • 16
1

In that line if [ $file1 != $file2 ] , you're not comparing content of two files, but file names only. So if [ "md5sum.txt" != "GeoLite2-City.md5" ] will be always true.

That should work:

if [ "`awk '{print $1;}' $file1`" != "`cat $file2`" ]; then
...do your logic here...
fi
Samuel
  • 3,631
  • 5
  • 37
  • 71
  • 1
    D'OH! As soon as I saw the `cat $file1` bit, I realized why I was having such trouble. I will check on this in a moment. – Keiro Oct 10 '15 at 03:07
  • 1
    This seems like unnecessary use of cat. If you're going to strip the first word from `$file1`, why not just use `$(awk '{print $1}' $file1)`? Or even `$(cut -d" " -f1 $file1)`? – ghoti Oct 10 '15 at 03:59