16 bit Checksum fuzzy analsysis - Leveraging "collisions", biases a thing?

Question

If playing around with CRC RevEng fails, what next? That is the gist of my question. I am trying to learn more how to think for myself, not just looking for an answer 1 time to 1 problem.

Assuming the following:

1.) You have full control of white box algorithm and can create as many chosen sample messages as you want with valid 16 bit / 2 byte checksums

2.) You can verify as many messages as you want to see if they are valid or not

3.) Static or dynamic analysis of the white box code is off limits (say the MCU is of a lithography that would require electron microscope to analyze for example, not impossible but off limits for our purposes).

Can you use any of these methods or lines of thinking:

1.) Inspect "collisions", i.e. different messages with same checksum. Perhaps XOR these messages together and reveal something?

2.) Leverage strong biases towards certain checksums?

3.) Leverage "Rolling over" of the checksum "keyspace", i.e. every 65535 sequentially incremented messages you will see some type of sequential patterns?

4.) AI ?

Perhaps there are other strategies I am missing?

CRC RevEng tool was not able to find the answer using numerous settings configurations

EDIT: Here are some example messages

Sample 1: FFFFFFFFFFFF /

0100100A00000000000000000000FFFFFFFFFF72BE

Sample 2: 000000000000 /

0100100A00000000000000000000FFFFFFFFFF2A3D

Sample 3: 000000000001 /

0100100A00000000000000000000FFFFFFFFFF89F7

Sample 4: 000000000002 /

0100100A00000000000000000000FFFFFFFFFF0864

Sample 5: 000000000003 /

0100100A00000000000000000000FFFFFFFFFFABAE

Sample 6: 000000000004 /

0100100A00000000000000000000FFFFFFFFFF9396

Sample 7: 000000000005 /

0100100A00000000000000000000FFFFFFFFFF305C

Sample 8: 000000000006 /

0100100A00000000000000000000FFFFFFFFFFB1CF

score 0 · Accepted Answer · answered Feb 10 '23 at 01:43

Key properties and attacks:

If you have two messages + CRCs of the same length and exclusive-or them together, the result is one message and one pure CRC on that message, where "pure" means a CRC definition with a zero initial value and zero final exclusive-or. This helps take those two parameters out of the equation, which can be solved for later. You do not need to know where the CRC is, how long it is, or which bits of the message are participating in the CRC calculation. This linear property holds.
Working with purified examples from #1, if you take any two equal-length messages + pure CRCs and exclusive-or them together, you will get another valid message + pure CRC. Using this fact, you can use Gauss-Jordan elimination (over GF(2)) to see how each bit in the message affects other generated bits in the message. With this you can find out a) which bits in the message are particiapting, b) which bits are likely to be the CRC (though it is possible that other bits could be a different function of the input, which can be resolved by the next point), and c) verify that the check bits are each indeed a linear function of the input bits over GF(2). This can also tell you that you don't have a CRC to begin with, if there don't appear to be any bits that are linear functions of the input bits. If it is a CRC, this can give you a good indication of the length, assuming you have a contiguous set of linearly dependent bits.
Assuming that you are dealing with a CRC, you can now take the input bits and the output bits for several examples and try to solve for the polynomial given different assumptions for the ordering of the input bits (reflected or not, perhaps by bytes or other units) and the direction of the CRC shifting (reflected or not). Since you're talking about an allegedly 16-bit CRC, this can be done most easily by brute force, trying all 32,768 possible polynomials for each set of bit ordering assumptions. You can even use brute force on a 32-bit CRC. For larger CRCs, e.g. 64 bits, you would need to use Berlekamp's algorithm for factoring polynomials over finite fields in order to solve the problem before the heat death of the universe. Then you would factor each message + pure CRC as a polynomial, and look for common factors over multiple examples.
Now that you have the message bits, CRC bits, bit orderings, and the polynomial, you can go back to your original non-pure messages + CRCs, and solve for the initial value and final exclusive-or. All you need are two examples with different lengths. Then it's a simple two-equations-in-two-unknowns over GF(2).

Enjoy!

Thank you, regarding #4 and the requirement for different lengths. Is this at all pertinent to the various CRC parameter guesser / brute forcer tools out there like SRP16? I'll have to do some more studying to understand this — JOULTICOA, Feb 10 '23 at 16:45
Ok thank you. Say you are iterating through / creating 65535 sequential chosen messages with valid checksums via brute force. Can you use an "exclusion" table to speed up this process by excluding checksums that were already found as statistically less likely to be valid again within this dataset of 65535 messages? I guess I don't have a good grasp on the entropy of CRCs and the likelihood for duplicates. — JOULTICOA, Feb 11 '23 at 17:53
Say that the 16 bit checksum is a "keyspace" of 65535. Say you are only modifying 2 bytes of a 27 byte message and iterating sequentially 1 - 65535. Each time you find a valid checksum, you add it to an "exclusion" list (therefore shrinking the 65535 "keyspace" to search). By the time you have reached 32767 messages, you are now only searching half of the "keyspace". Is this a valid strategy? Relying on the improbability of CRC collisions to reduce the search space? — JOULTICOA, Feb 11 '23 at 18:19
That still is making no sense. I don't see how modifying the message to get a particular CRC helps you determine the CRC parameters. I don't know what the "key" is in your "keyspace", or what your "search" is searching for. By the way, all 16-bit CRC values are possible, so there are 65536 of them, not 65535. — Mark Adler, Feb 11 '23 at 19:21
I have edited the question with some samples we are trying to work on, perhaps my XORing pairs is not taking account for MSB / LSB bit order, and LE / BE byte ordering? — JOULTICOA, Mar 28 '23 at 16:20

16 bit Checksum fuzzy analsysis - Leveraging "collisions", biases a thing?

1 Answers1