2

Below is the code which generates MD5 / SHA2 sum of individual files present under directory or sub directories recursively.

#!/usr/bin/perl -w
use strict;
use warnings;
use File::Find;
use IO::File;
use Digest::MD5;
use Digest::SHA qw(sha256_hex);

find({ wanted => \&process_file, no_chdir => 1 }, @ARGV);

sub process_file {
    #my $md5 = Digest::MD5->new;
    my $sha2 = Digest::SHA->new(256);
    if (-f $_) {
        #print "This is a file: $_\n";
        open(FILE, $_) or die "Can not open $_";
        binmode(FILE);
        #my $md5sum = $md5->addfile(*FILE)->hexdigest;
        my $sha2sum = $sha2->addfile(*FILE)->hexdigest;
        #print sha256_hex(*FILE), "  $_\n";
        close FILE;
        print "$sha2sum  $_\n";
    }
}

The output of above code is given below.

~$ perl list.pl src
f21e1caa364eaad195d968d28187d5cf1a58c0b7b1f21a8ebcb9ca2539dde175  src/test1.pl
4b3277ec41ba0ff8ed6f9f2593c42e08c2f4e9b66df0d63de7c91559ff7e86fa  src/random.py
076231fcbe5887a163278b757f99fb05b27163775ec4706cb2365de3be0906ac  src/test.pl
8806c9f58fc91b2e1d6453a7af7e4f9f8b94e2d0f67a84a89b35bfbf517399be  src/size.pl
5a1b2080ecc53ced45ed3aa13e47118a9ca2f8505b1e89485b6b681d8e1d264c  src/test2.py
5f7c1ff9c7b3dd32f75558dd30324ec085c45a0d0c62190b9a96f211cdf216ea  src/java/test3.class
3728ee1a86443fffe9eafd84db82ce68c9640a0a984958f579b0da1a74283d7c  src/java/test4.wav
d7169ffbb231e93f47d1c54fddf2144b459bba228de48c30b4bc5a4d297be6fb  src/java/test5.java

Updated code to support sha256sum generation.

Now I want to generate a combined MD5 / SHA2 sum from these MD5 / SHA2 sums as input.

Abhinav
  • 975
  • 4
  • 18
  • 35
  • 1
    You don't need the `-w` on the shebang if you have `use warnings`. – simbabque Jan 27 '14 at 17:06
  • yup, i will remove that – Abhinav Jan 27 '14 at 17:24
  • So basically you want to check if a hole directory tree is identical? – simbabque Jan 27 '14 at 18:48
  • Nope this check sum will be use to verify the integrity of list of files. And I am trying to find a way through which I can tell user about the HASH. Using a combined HASH I am trying to introduce more integrity check. One way is dump check sum in file and store check sum of the file. Now when program run check the integrity of file and then check the integrity of files in the directory. As this is regular way user can modify the check sum and try to override it. So I am looking for alternative which can be achieved. – Abhinav Jan 27 '14 at 18:58

2 Answers2

1
  1. Digest::MD5 was first released as a Core module with perl v5.7.3 (March 2002) [1]. The oldest version of perl being widely used today is v5.8.8, so any perl you are going to encounter will have this module available.

  2. The oldest version of Digest::MD5 which I could find (v1.99.59-TRIAL from 1998) already has the add and addfile methods. So whatever version of that module you encounter, you will have the add method available.

You can therefore safely rely on that functionality, instead of having to use some ugly and unportable hack like calling a command line tool.

Make sure that you traverse each directory in a specific order so that the checksum is reproducible.

Note that MD5 is an effectively broken algorithm, which shouldn't be used except to interface with legacy systems. The SHA-2 family of hash functions is preferable for most tasks where a fast hash is required.


[1] Use the corelist command line tool from Module::Corelist to query core modules of different perl versions.

amon
  • 57,091
  • 2
  • 89
  • 149
  • basically I was also thinking to use sha256sum instead of md5sum. The above code is just a sample but still my problem is to combine all the independent sum into a final sum. How to do that as I am avoiding padding of sum as there could be good amount of file which can result into a good memory consumption. – Abhinav Jan 27 '14 at 18:42
0

Try:

use File::Find 'find';
use Digest::SHA 'sha256_hex';

my @allsums;

sub process_file {
  push @allsums, Digest::SHA->new(256)->addfile($_)->hexdigest . " $_" if -f $_;
}

find({ wanted => \&process_file, no_chdir => 1 }, @ARGV);

print sha256_hex(join ':', sort @allsums), "\n";
karel-m
  • 111
  • 2