5

perl Digest module computes different SHA1 digest for add and addfile functions. I have created binary random data using /dev/urandom

running on ubuntu

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.1 LTS
Release:        12.04
Codename:       precise

$ perl -v
This is perl 5, version 14, subversion 2 (v5.14.2) built for i686-linux-gnu-thread-multi-64int

output from the script

$ perl t.pl sha1 a.tmp
doesntwork      da39a3ee5e6b4b0d3255bfef95601890afd80709
works           ee49451434cffe001a568090c86f16f076677af5
$ openssl dgst -sha1 a.tmp
SHA1(a.tmp)= ee49451434cffe001a568090c86f16f076677af5

following in my code

use strict;
use warnings;
use Switch;
use Digest;

sub doesntwork {
    my ($datafile, $hashfun) = @_;
    open(my $fh, "<", $datafile ) or die "error: Can't open '$datafile', $!\n";
    binmode($fh);
    read($fh, my $data, -s $datafile);
    close($fh);

    $hashfun->add($data);
    my $hashval = $hashfun->digest();

    return unpack('H*', $hashval);
}

sub works {
    my ($datafile, $hashfun) = @_;
    open(my $fh, "<", $datafile ) or die "error: Can't open '$datafile', $!\n";
    binmode($fh);

    $hashfun->addfile($fh);
    my $hashval = $hashfun->digest();

    close($fh);

    return unpack('H*', $hashval);
}

###############################################################################
(@ARGV >= 2) or die "usage: perl $0 algo datafile\n";
my ($algo, $datafile) = @ARGV;

my $hashfun;
switch($algo) {
    case "md5"    {$hashfun = Digest->new("MD5"    );}
    case "sha1"   {$hashfun = Digest->new("SHA-1"  );}
    case "sha256" {$hashfun = Digest->new("SHA-256");}
    case "sha512" {$hashfun = Digest->new("SHA-512");}
    else          {die "error: invalid algorithm '$algo'\n"}
}

print "doesntwork\t", doesntwork( $datafile, $hashfun ), "\n";
print "works     \t", works     ( $datafile, $hashfun ), "\n";

I would like add function to work, as I want to compute it on buffered data, not from file data. Possible add treats data as text, while for addfile, binmod on file handle makes it use binary data, if so how can I make add to treat buffer as binary data.

Edited post to print size of the data read--

    $ stat -c "%n %s" a.tmp
    a.tmp 671088640

    $ openssl dgst -sha1 a.tmp
    SHA1(a.tmp)= 7dfcced1b0c8864e6a20b2daa63de7ffc1cd7a26

    #### Works
    $ perl -W -MDigest -e 'open(my $fh, "<", "a.tmp") or die "cant open $!\n";
    > binmode($fh);
    > my $hf = Digest->new("SHA-1");
    > $hf->addfile($fh);
    > print unpack("H*", $hf->digest()),"\n";
    > close($fh);'
    7dfcced1b0c8864e6a20b2daa63de7ffc1cd7a26

    #### Doesnt Work
    $ perl -W -MDigest -e 'open(my $fh, "<", "a.tmp") or die "cant open $!\n";
    > binmode($fh);
    > read($fh, my $data, -s "a.tmp") or die "cant read $!\n";
    > close($fh);
    > printf("## data.length=%d,file.length=%d\n",length($data),-s "a.tmp");
    > length($data)==(-s "a.tmp") or die "couldnt read all the data";
    > my $hf = Digest->new("SHA-1");
    > $hf->add($data);
    > print unpack("H*", $hf->digest()),"\n";'
    ## data.length=671088640,file.length=671088640
    9eecafd368a50fb240e0388e3c84c0c94bd6cc2a

Also tried according to Fred's answer

    $ perl -W -MDigest -e '
    > open(my $fh, "<", "a.tmp") or die "cant open $!\n";
    > binmode($fh);
    > my $size = -s "a.tmp";
    > my $got = read($fh, my $data, $size) or die "cant read $!\n";
    > print "##read $got bytes, size=$size\n";
    > my $done = $size - $got;
    > print "done=$done, size=$size, got=$got\n";
    > until(!$done) {
    >   $got   = read($fh, my $newdata, $done);
    >   $done -= $got ;
    >   $data .= $newdata;
    >   print "##read1 $got bytes, size=$size, done=$done\n";
    > }
    > close($fh);
    > printf("## data.length=%d,file.length=%d\n",length($data),-s "a.tmp");
    > length($data)==(-s "a.tmp") or die "couldnt read all the data";
    > my $hf = Digest->new("SHA-1");
    > $hf->add($data);
    > print unpack("H*", $hf->digest()),"\n";'
    ##read 671088640 bytes, size=671088640
    done=0, size=671088640, got=671088640
    ## data.length=671088640,file.length=671088640
    9eecafd368a50fb240e0388e3c84c0c94bd6cc2a

2 Answers2

1

You have yet to provide data that produces the problem, but I cannot replicate your problems using the Perl script as the input.

Here's the definition of addfile:

sub addfile {
    my ($self, $handle) = @_;

    my $n;
    my $buf = "";

    while (($n = read($handle, $buf, 4*1024))) {
        $self->add($buf);
    }
    unless (defined $n) {
    require Carp;
    Carp::croak("Read failed: $!");
    }

    $self;
}

Your claim that addfile works and add doesn't makes much sense. I suppose there could be a bug in the module when it comes to handling long strings, but it's far more likely that you're passing different inputs to the module.

ikegami
  • 367,544
  • 15
  • 269
  • 518
0

You need to test the return value from read. There is no guarantee that you have read the full contents of the file.

read in perl is generally implemented as a call to underlying system call fread. When you use low level reads like this you must test the return value to see if you got as much as you asked for.

$size = -s $datafile ; 
$got = read($fh, my $data, $size);
$done = $size - $got ; 
until ( $done ) {
     $got = read($fh, my $newdata, $done ); 
     $done -= $got ; 
     $data .= $mydata ;     
}

That's just off the top of my head and probably has a glaring fencepost error. This is why I avoid using read whenever possible. See, http://perltricks.com/article/21/2013/4/21/Read-an-entire-file-into-a-string for some less painful ways to do this.

  • edited post with update to script as per your suggestion, it never goes on `until` loop as all data is read in first `read` call – user3307582 Feb 14 '14 at 12:35
  • That's insane code. `my $data; 1 while read($fh, $data, 64*1024, length($data));` – ikegami Feb 14 '14 at 14:25