3

Is there a package/function in Perl that gives me in an easy way

  • a short information (e.g. a number, short string)
  • like a hashvalue/checksum (e.g. MD5)
  • with good distinction (e.g. cryptographic hash like MDx, SHAx)
  • representing the content of a complex datastructure (e.g. hash of arrays of hashes).

Best idea I have in mind is

  1. serialize my structure to a string (e.g. with Data::Dumper)

  2. Hash over the string with MDx

But maybe there is some more elegant way.

chris01
  • 10,921
  • 9
  • 54
  • 93
  • 1
    Any hashing may be fine but it can of depends on the purpose of that. Is it to compare two structures "quickly" and assess if they are different? (In which case if the serialization is not done exactly in the same way you risk getting different hashes for no real change in the structure). – Patrick Mevzek Aug 06 '18 at 18:50
  • I think any checksum is for a kind of comparing :-) – chris01 Aug 06 '18 at 19:13
  • 3
    It depends. If you remain vague (no idea on constraints of size, speed, etc.), your answer is already in your question: serialize and hash. But I still think that the best answer depends on the purpose of that. Take for example physical medias such as CDROMs: they have checksum to see if the media was corrupted or not, but there is nothing to compare it too, you just know if it is ok or not. Some more complicated checksums allows you to both detect and repair (part of) corrupted data. – Patrick Mevzek Aug 06 '18 at 19:21
  • 1
    Take care that you are serializing consistently (e.g., you are setting `$Data::Dumper::Sortkeys`) so that identical input produces identical output. – mob Aug 06 '18 at 19:29
  • Would computing the checksum from the JSON representation be completely off? – U. Windl May 30 '22 at 08:38

2 Answers2

6

In the past, I used the Data::Dumper (with sorted keys, as pointed by @mob) + Digest::MD5 approach for creating checksums of complex data structures. In my case, the purpose was to compare two or more data structures for equality.

(Very) Simple snippet:

use Data::Dumper qw( Dumper ) ;
use Digest::MD5 qw( md5_hex) ;

sub digest {
    my $data = shift ;
    local $Data::Dumper::Sortkeys = 1;
    return md5_hex( Dumper($data) ) ;
}

Synopsis:

my $cplx_data_checksum = digest({ 
    c => 1 , 
    b => [ 1 , { a => 2 } ]
}) ;

For insights about Digest algo's speed please take a look to the Digest Perl module at https://metacpan.org/pod/Digest#Digest-speed

Hope this helps

Hannibal
  • 445
  • 4
  • 13
  • 1
    Hello. Please correct your example. The switch used to enable sorting of keys is $Data::Dumper::Sortkeys, not SortKeys (lower-case 'k' vs upper-case 'K'). – antred Sep 24 '20 at 13:56
3

I would consider using

Sereal Encoder

I've used it for a similar problem and was very happy with it, it is fast, offers all the options I could think of needing, and did not take me long to get going at all.

For example, it allows you to choose how to deal with objects, and whether to sort keys on hashes which can be very useful.

Have fun!

bytepusher
  • 1,568
  • 10
  • 19