24

Here's 3 example md5 hashes

$ md5 -s "1" && md5 -s "2" && md5 -s "3"
MD5 ("1") = c4ca4238a0b923820dcc509a6f75849b
MD5 ("2") = c81e728d9d4c2f636f067f89cc14862c
MD5 ("3") = eccbc87e4b5ce2fe28308fd9f2a7baf3

Say I wanted to take 8 characters from any hash. Is the beginning part of the hash particularly more "random" than the end? middle? Or are all substrings equally "random"?

maček
  • 76,434
  • 37
  • 167
  • 198
  • 1
    In my opinion, "random" is not the right word here. Hash functions are as deterministic as they get; there is no randomness involved at all. You shold probably ask if a substring of a hash has the same collusion resistance as the orginal (accounting for the different length, of course). – Jens Sep 29 '10 at 08:07
  • I was just about to ask this very question.. – insaner Apr 17 '16 at 05:15

4 Answers4

20

I was curious myself, so I went ahead and wrote a program to test this. You'll need Crypto++ to compile the code.

Disclaimer: When it comes to cryptography, or even just mathematics in general, I know just enough to shoot myself in the foot. So, take the following results with a grain of salt and keep in mind that I only have a cursory knowledge of the tools I'm using.

I only sampled three substrings: the first 8 bytes, the middle 8 bytes, and the last 8 bytes. Long story short, they're equally random.

However, when using a smaller sample space, it appears as if the last 8 bits are slightly more random. The larger the sampling space, the closer all three substrings approach complete randomness.


1000 iterations:

First:  0.995914
Middle: 0.996546
Last:   0.998104

5000 iterations:

First:  0.998387
Middle: 0.998624
Last:   0.999501

10000 iterations:

First:  0.999614
Middle: 0.999457
Last:   1

30000 iterations:

First:  1
Middle: 1
Last:   1

"Randomness" is measured by Crypto++'s MaurerRandomnessTest class. For reference, the executable compiled from the above code has a randomness value of 0.632411 and a copy of Shakespeare's Macbeth downloaded from Project Gutenburg has a randomness value of 0.566991.

kurige
  • 3,749
  • 2
  • 21
  • 24
14

Nitpick: "random" is the wrong word to use here, since hash functions are deterministic.

As for answering what you mean :), a desirable property of hash functions is achieving the Avalanche effect: basically, to have every bit of input cause drastic changes to the output. So, for a well-designed hash, every substring should be affected equally often ("be as random") as any other.

snemarch
  • 4,958
  • 26
  • 38
12

All substrings of a good hash (and md5 is reasonably good despite being cryptographically unsafe) are equally random, so yes, take any bits you like from the string, they should be equally distributed.

Gintautas Miliauskas
  • 7,744
  • 4
  • 32
  • 34
0

Measuring the randomness of the output of a hash function can be done using Statistical tests done on pseudo-random number generators. According to the Handbook of Applied Cryptography §5.4.4 (sample chapters available for free), there are five basic tests:

  1. Frequency test (monobit test)
  2. Serial test (two-bit test)
  3. Poker test
  4. Runs test
  5. Autocorrelation test

Then, of course, there's the Maurer's universal statistical test that kurige has already mentioned.