5

I am investigating about the collision propability of CRC checksums when they are used as a hashes. I know how to calculate the collision propability for hash algorithms that are evenly distributed (which means the chance to get all possible checksums for random input data is the same).

What I do not know (and I couldn't find in the web):

  1. Are CRC checksums generally [not] evenly distributed?
  2. Does the distribution depend from the polynomial?
  3. Does the distribution depend from the input data size?

P.S.: I am aware of the restrictions when using CRCs as hashes, so this is not part of this question.

Silicomancer
  • 8,604
  • 10
  • 63
  • 130

2 Answers2

5

Aside from malicious intent (you can force any CRC you like by changing bits in the message), CRCs are evenly distributed over all values. The polynomial does not matter, so long as it is a valid CRC polynomial, and the input only needs to be the size of the CRC or larger.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
1

I was also curious about this, so I did some tests using the crc32 command on Linux:

# I am printing each number several times so the data is longer than 32-bits:
$ for N in {000001..999999}; do echo -n $N$N$N$N | crc32 /dev/stdin; done >crcs


# There are no complete (8-character) collisions:
$ cat crcs | sort | uniq -d | wc -l
0

# There are no 7-character collisions:
$ for COL in 1 2; do cat crcs | awk "{print substr(\$1,$COL,7)}" | sort | uniq -d; done | wc -l
0

# There are exactly 32k 6-character collisions:
$ for COL in 1 2 3; do cat crcs | awk "{print substr(\$1,$COL,6)}" | sort | uniq -d; done | wc -l
32768


# Also, the distribution of the letters in each column is *extremely* uniform.
# Each column has results similar to these:
$ cat crcs | awk '{print substr($1,1,1)}' | sort | uniq -c
  62440 0
  62439 1
  62440 2
  62440 3
  62560 4
  62560 5
  62560 6
  62560 7
  62560 8
  62560 9
  62560 a
  62560 b
  62440 c
  62440 d
  62440 e
  62440 f

...So my conclusion is that CRC32 does a very good job of evenly distributing the checksums.

likebike
  • 1,167
  • 1
  • 11
  • 16