2

Good afternoon Overflowers! ;)

What I want to do:

I'm interested in verifying transferred file integrity.

How I approached it:

I was considering using a hash code for this but there's one problem. The files can be extremely large so I need to be able to iteratively generate the hash. I can't load the entire file into memory.

What I've looked at so far:

I'm investigating murmur3 and skein for the hash function. I believe I understand how to make it work with skein but the version I've built fails all the known result unit tests. I'm not sure how to "chain" murmur to get a valid result.

Any suggestions?

Jay
  • 13,803
  • 4
  • 42
  • 69
  • Egad! I'm calling the hash police! You'll be arrested for sure! – corsiKa Mar 10 '11 at 22:47
  • Well how about taking a look at http://en.wikipedia.org/wiki/Cyclic_redundancy_check – dexter Mar 10 '11 at 22:48
  • I thought about CRC's but thought the hash function was superior. Thanks – Jay Mar 10 '11 at 22:52
  • A hash *is* superior - much smaller risk of collisions aka accidental undetected errors. – Erik Mar 10 '11 at 23:02
  • A hash is _not_ superior. If they appear to have a smaller risk of collisions, it's because they produce more bits than CRC-32. Compare same-width hash and CRC functions – MSalters Mar 11 '11 at 09:00
  • Comparing a 32 bit CRC to a 128/256/512 bit hash isn't a valid comparison. The consensus of people who've studied it more than I is that Hash is better for detecting intentional modifications. That was actually my main concern though I didn't state it here. Thanks to everyone for the education, and to glowcoder for reporting me to the hash police. ;) – Jay Mar 11 '11 at 15:53

2 Answers2

5

Most hash algorithms operate on fixed-size blocks of data - you can e.g. look up SHA1 or MD5 reference implementations, they use an "init/loop { update }/finalize" construct allowing you to pass as much or little data as you wish in every update.

Looking at e.g. Skein, they use the same concept in their reference implementation:

int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);

Why do you think you need to pass the entire data as one block? Are you looking at simplified wrapper functions?

Erik
  • 88,732
  • 13
  • 198
  • 189
  • I have a version of it that compiles but I can't make it generate the outputs the author got. He provides a set of "KAT" test results for his submission to the SHA contest committee. – Jay Mar 10 '11 at 22:55
  • I was looking at murmur hash and don't see a way to use it iteratively like you can with skein. The only problem is I don't believe my version of skein is working correctly. I can't get it to produce the correct responses according to it's author's paper. I guess I will just keep trying to debug skein. – Jay Mar 11 '11 at 00:45
  • I'm accepting this answer. Even though the skein code doesn't get what I think should be the correct answer it is passing tests designed to see if it's a working hash. – Jay Mar 11 '11 at 15:50
1

You should have a look at Crypto++. It's my favorite cryptographic C++ library.

And here's how you could use it.

Morten Kristensen
  • 7,412
  • 4
  • 32
  • 52