6

How can I compare the md5 sums for 2 files in one command?

I can compute them each individually:

my_prompt$ md5sum file_1.sql
20f750ff1aa835965ec93bf36fd8cf22  file_1.sql

my_prompt$ md5sum file_2.sql
733d53913c366ee87b6ce677971be17e  file_2.sql

But wonder how this can be combined into a single comparison computation. I have tried different approaches that fails:

my_prompt$ md5sum file_1.sql == md5sum file_2.sql
my_prompt$ `md5sum file_1.sql` == `md5sum file_2.sql`
my_prompt$ (md5sum file_1.sql) == (md5sum file_2.sql)
my_prompt$ `md5sum file_1.sql` -eq `md5sum file_2.sql`

What am I missing here ? Tried following Compare md5 sums in bash script and https://unix.stackexchange.com/questions/78338/a-simpler-way-of-comparing-md5-checksum without luck.

Gustav Rasmussen
  • 3,720
  • 4
  • 23
  • 53

4 Answers4

9

You need a program/built-in that evaluates the comparison. Usually you would use test/[/[[ to do so. With these programs -eq compares decimal numbers. Therefore use the string comparison = instead.

[[ "$(md5sum file_1.sql)" = "$(md5sum file_2.sql)" ]]

The exit code $? of this command tells you wether the two strings were equal.

However, you may want to use cmp instead. This program compares the files directly, should be faster because it doesn't have to compute anything, and is also safer as it cannot give false positives like a hash comparison can do.

cmp file_1.sql file_2.sql
Socowi
  • 25,550
  • 3
  • 32
  • 54
  • Exactly what I needed. Clear and precise explanation. – Gustav Rasmussen May 28 '21 at 13:20
  • One question: To get the exit code, you must do a 2nd call ? like `echo $?` – Gustav Rasmussen May 28 '21 at 13:22
  • Any way to get everything into one single call ? – Gustav Rasmussen May 28 '21 at 13:23
  • 2
    Yes, but you usually don't access `$?`. Instead, you write something like `if [[ a = b ]]; then echo equal; else echo different; fi`. `cmp` tells you directly if the files are different and remains quiet if they are the same. – Socowi May 28 '21 at 13:23
  • Might better add a `-n 32` to compare the 32 ascii character (16 bytes) only – Ryan Chen Jul 05 '22 at 00:17
  • @RyanChen I don't understand. Does the `sql` file format start with a plain text hexadecimal hash of the entire file? If it does, that would be a great *shortcut* to rule out equality. But I would do a full check anyways, to rule out false positive equality (for cases like checking if a copy is corrupted, this is absolutely necessary). If the first bytes differ, any reasonable implementation of `cmp` should already skip the rest. – Socowi Jul 05 '22 at 07:30
5

By passing the filenames as arguments to the md5sum command, we have something like:

$ md5sum foo.json bar.json
07a9a5c765f5d861b506eabd02f5aa4b *foo.json
07a9a5c765f5d861b506eabd02f5aa4b *bar.json

So, we have to compare the first column of the md5sum output:

if [[ $(md5sum foo.json bar.json | awk '{print $1}' | uniq | wc -l) == 1 ]]
then
    echo "Identical files"
else
    echo "There are differences"
fi

In the case we need the return code we can use the test command as follows:

test $(md5sum foo.json bar.json | awk '{print $1}' | uniq | wc -l) == 1

Let's breakdown the command:

$ md5sum foo.json bar.json
07a9a5c765f5d861b506eabd02f5aa4b *foo.json
07a9a5c765f5d861b506eabd02f5aa4b *bar.json

$ md5sum foo.json bar.json | awk '{print $1}'
07a9a5c765f5d861b506eabd02f5aa4b
07a9a5c765f5d861b506eabd02f5aa4b

$ md5sum foo.json bar.json | awk '{print $1}' | uniq
07a9a5c765f5d861b506eabd02f5aa4b

$ md5sum foo.json bar.json | awk '{print $1}' | uniq | wc -l
1

$ test $(md5sum foo.json bar.json | awk '{print $1}' | uniq | wc -l) == 1

$ echo $?
0
funk
  • 2,221
  • 1
  • 24
  • 23
2

This will work with bash

$ md5sum file1 file2 | md5sum --check

You will get OK for both files if md5 are equal. You can also use this for 3 files or more.

big_daddy
  • 21
  • 3
  • This will work, but will not solve the problem, to check if both files are identical. In this line you only check, if the just generated md5sums are correct. They have to be, because you just generated it. ;-) – schulle877 May 04 '22 at 09:49
1

Following @Socowi's answer, there is a way to get the answer in one line:

[[ "$(md5sum file_1.sql)" = "$(md5sum file_2.sql)" ]] && echo "Same content" || echo "Different content"

&& and || act as and and or. When you get && followed by || it works as a ternary operator in other programming languages. In other words, if the md5sum are equals, then echo "Same content", else echo "Different content".