5

How can I verify two CRC implementations will generate the same checksums?

I'm looking for an exhaustive implementation evaluating methodology specific to CRC.

Joseph Weissman
  • 5,697
  • 5
  • 46
  • 75
  • 2
    Note that testing against a lot of inputs probably shows that both implementations use the same algorithm, but not that they are implemented properly. If one of the CRC algorithms has a fencepost error that only happens for inputs divisible by some weird formula involving factors of 32, the failed input space might be very small (which also means it will mostly work). This is primarily only a concern if you are trying to implement it yourself or using a weird poorly tested implementation found on a discussion board. Well-tested implementations probably don't have this kind of issue. – Brian Aug 30 '10 at 18:31

5 Answers5

6

You can separate the problem into edge cases and random samples.

Edge cases. There are two variables to the CRC input, number of bytes, and value of each byte. So create arrays of 0, 1, and MAX_BYTES, with values ranging from 0 to MAX_BYTE_VALUE. The edge case suite will be something you'll most likely want to keep within a JUnit suite.

Random samples. Using the ranges above, run CRC on randomly generated arrays of bytes in a loop. The longer you let the loop run, the more you exhaust the inputs. If you are low on computing power, consider deploying the test to EC2.

Robert Christian
  • 18,218
  • 20
  • 74
  • 89
2

One nice property of CRCs is that for a given set of parameters (polynomial, reflection, initial state, etc.) you will get a constant value when you recompute the CRC over the original dataset + the original CRC. These constants are documented for common CRCs but you can just blindly generate them using two different random data sets and check that they are the same:

implementation 1: crc(rand_data_1 + crc(rand_data_1)) -> constant_1
implementation 2: crc(rand_data_2 + crc(rand_data_2)) -> constant_2
assert constant_1 == constant_2

You can use the same method within an implementation to get a warm fuzzy feeling about its correctness. If your implementation works with arbitrary polynomials, you can have the unittest exhaustively check every possible polynomial using this method without needing to know what the constants are.

This technique is powerful but it would also be wise to add an independent test that verifies the result based on known input for the pathological case where your CRC implementations both produce bad results that happen to get by the constant equivalence check.

Kevin Thibedeau
  • 3,299
  • 15
  • 26
2

Create several unit tests with the same input that will compare the output of both implementations against each other.

Bernard
  • 7,908
  • 2
  • 36
  • 33
  • 1
    I've got this in place, but how can we know we've covered (at least a meaningful fragment of) all possible input? I'm looking for something related to the nature of CRCs here that could point us in the direction of HOW to write these tests in a way that effectively covers the input range. – Joseph Weissman Aug 30 '10 at 17:55
  • 2
    @Joe- A series of 20-30 random inputs of different sizes should be sufficient to prove that the CRC algorithms produce equal outputs. I have never seen two implementations that produced output that was "close" to each other; instead, even slight differences produced large changes in the output. That being said, if these are homemade CRC implementations that aren't known to be bug-free, then a coding bug may cause problems for your tests. – bta Aug 30 '10 at 18:08
  • I know my above comment is "proof by lack of a counterexample", but the idea still holds. If the implementations use different polynomials, then you will get radically different results. – bta Aug 30 '10 at 18:10
  • 3
    I would also test at least 256 "consecutive" inputs (i.e. same file, but pick a byte and change it to all 256 possible bytes). – Brian Aug 30 '10 at 18:32
  • 1
    You should note that you could test each step of calculating the CRC of a buffer or string because each step porduces a CRC (partial) checksum. – nategoose Aug 30 '10 at 19:11
  • 1
    Whilst random input data are useful as regression tests, they're not very useful as exhaustive proof of correctness. There're all sorts of nasty edge-case bugs that can creep in to this sort of thing (e.g. if you're on a 16-bit embedded processor, carrying from LO to HI). You'll need to do more than just random input. – Oliver Charlesworth Aug 30 '10 at 21:56
1

First, if it is a standard CRC implementation, you should be able to find known values somewhere on the net.

Second, you could generate some number of payloads and run the each CRC on the payloads and check that the CRC values match.

Adam Tegen
  • 25,378
  • 33
  • 125
  • 153
0

By writing a unit test for each which takes the same input and verify against the expected output.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • I understand -- this is what we're doing -- but how do we know we've exhausted (or even come close to exhausting) the possible input space? I'm looking for something based on the way CRCs work specifically. – Joseph Weissman Aug 30 '10 at 18:05