Computation of 64 bit CRC polynomial performance

Question

I found the following page in the web: https://users.ece.cmu.edu/~koopman/crc/crc64.html

It lists the performance of a handful of 64 bit CRC polynomials. The optimal payload for a hamming distance of 3 is listed as 18446744073709551551 bit. A polynomial providing that HD 3 payload is 0xd6c9e91aca649ad4 (Koopman notation).

On the same website there is also some basic "HDLen" C code that can compute the performance of any polynomial (https://users.ece.cmu.edu/~koopman/crc/hdlen.html). I checked that code and the HD 3 optimized loop is very simple, similar to this:

Poly_t accum = cPoly;   
Length_t len = 0;

while(accum != cTopBitSet)
{
    accum = (accum & 1) ? (accum >> 1) ^ cPoly) : (accum >> 1);
    len++;  
}

18446744073709551551 is a huge number. It is almost the full range of a 64 bit integral. Even that simple loop would run centuries on the most powerful CPU core available.

It also appears to me that this loop can not be parallelized since each iteration depends from the previous iteration.

It is claimed that payload is optimal amongst all possible 64 bit polynomials which means that all possible 64 bit polynomials would have been checked for their individual HD 3 performance. This task can be parallelized, still the huge number of candidate polynomials seems to be undoable.

I can't see a way to even compute a single (good) polynomial's (HD 3) performance. Not to mention all possible 64 bit wide polynomials.

So I wonder: How has the number been found? What kind of code or method (in contrast to the simple HDLen software) was used to find the mentioned optimal HD 3 payload?

*"How has the number been found? What kind of code or method (in contrast to the simple HDLen software) was used to find the mentioned optimal HD 3 payload?"* - You could start by contacting Prof Koopman how he did it or where he got the number from? — Stephen C, Jul 16 '22 at 02:41
Someone else asked him about alternatives for HDLen in his blog. He wrote something about memory intensive implementations and about more complex implementations for productive use. He was also asked how such implementations work and if they are available for download or if they are some kind of secret. He didn't answer so I thought it would be better to ask here. Anyway Mark Adler obviously knows the answer. Also on the page it is said that those numbers are (partial) work of other people. — Silicomancer, Jul 16 '22 at 17:29
I don't know the complete answer for all the tricks Koopman might have used in his unreleased code. — Mark Adler, Jul 17 '22 at 16:29

Mark Adler · Accepted Answer · 2022-07-17T22:11:29.277

1

It is a primitive polynomial, where it can be shown that the HD=3 length of any primitive polynomial over GF(2) is 2ⁿ-(n+1), where n is the degree of the polynomial.

It can be shown pretty quickly whether a polynomial over a finite field is primitive or not.

Also, it is possible to compute the CRC of a very sparse codeword of n bits in O(log n) time instead of O(n) time. Here is an example in C, demonstrating the case mentioned for the provided CRC:

#include <stdio.h>
#include <stdint.h>

// Jones' 64-bit primitive polynomial (the constant excludes the x^64 term):
// 1 + x^3 + x^5 + x^7 + x^8 + x^10 + x^12 + x^13 + x^16 + x^19 + x^22 + x^23 +
// x^26 + x^28 + x^31 + x^32 + x^34 + x^36 + x^37 + x^41 + x^44 + x^46 + x^47 +
// x^48 + x^49 + x^52 + x^55 + x^56 + x^58 + x^59 + x^61 + x^63 + x^64
#define POLY 0xad93d23594c935a9
#define HIGH 0x8000000000000000         // high bit set

// Return polynomial a times polynomial b modulo p (POLY). a must be non-zero.
static uint64_t multmodp(uint64_t a, uint64_t b) {
    uint64_t prod = 0;
    for (;;) {
        if (a & 1) {
            prod ^= b;
            if (a == 1)
                break;
        }
        a >>= 1;
        b = b & HIGH ? (b << 1) ^ POLY : b << 1;
    }
    return prod;
}

// x2n_table[n] is x^2^n mod p.
static uint64_t x2n_table[64];

// Initialize x2n_table[].
static void x2n_table_init(void) {
    uint64_t p = 2;                 // first entry is x^2^0 == x^1
    x2n_table[0] = p;
    for (size_t n = 1; n < 64; n++)
        x2n_table[n] = p = multmodp(p, p);
}

// Compute x^n modulo p. This takes O(log n) time.
static uint64_t xtonmodp(uintmax_t n) {
    uint64_t x = 1;
    int k = 0;
    for (;;) {
        if (n & 1)
            x = multmodp(x2n_table[k], x);
        n >>= 1;
        if (n == 0)
            break;
        k++;
    }
    return x;
}

// Feed n zero bits into the CRC, taking O(log n) time.
static uint64_t crc64zeros(uint64_t crc, uint64_t n) {
    return multmodp(xtonmodp(n), crc);
}

// Feed one one bit into the CRC.
static uint64_t crc64one(uint64_t crc) {
    return crc & HIGH ? crc << 1 : (crc << 1) ^ POLY;
}

// Return the CRC-64 of one one bit, followed by n zero bits, followed by one
// more one bit.
static uint64_t crc64_one_zeros_one(uint64_t n) {
    return crc64one(crc64zeros(crc64one(0), n));
}

int main(void) {
    x2n_table_init();
    uint64_t n = -2;    // code word with 2^64 bits: a 1, 2^64-2 0's, and a 1
    printf("%llx\n", crc64_one_zeros_one(n));   // prints 0
    return 0;
}

That calculation completes in about 7.4 µs on my machine. As opposed to the bit-at-a-time calculation, which would take about 560 years on my machine.

edited Jul 17 '22 at 22:11

answered Jul 16 '22 at 01:28

Mark Adler

101,978
13
118
158

Wow, that is important to know. Strange that HDLen didn't integrate this as an optimization. Prof. Koopman wrote on his website that primitive polynomials have optimal length for HD=3 but I can't remember he mentioned that corresponding equation. Do you know if there are more polynomial analysis bases "shortcuts" of that kind one can use to speed up performance computation? – Silicomancer Jul 16 '22 at 17:40
Yes. It is possible to compute the CRC of a very long sequence of bits that are all zeros in O(log n) time instead of O(n) time. If there are k one bits in the sequence, then the whole thing can be done in O(k + log n) time. k is very small for these Hamming distance calculations, so essentially O(log n) for the whole sequence. – Mark Adler Jul 16 '22 at 18:58
That's interesting. Do you know of any public implementation of that algorithm? Does it have a name so I can search for more information? I could not find anything about it :( – Silicomancer Jul 16 '22 at 19:16
I don't know that it has a name. I have used it in my CRC combination code [here](https://github.com/madler/zlib/blob/master/crc32.c#L1080). – Mark Adler Jul 16 '22 at 19:28
I was referring only to calculating the CRC of a single sequence of bits with sparse ones, in particular _k_ ones. As for your comment, you do not need to sort. All you need to do is calculate the CRC using just the polynomial (no initial value or final exclusive-or), and see if any of them are zero. Any zeros are undetected errors. – Mark Adler Jul 17 '22 at 15:34
The OP was wondering how in the world you could even compute the CRC of that many bits in less than centuries. I have added example code to the answer demonstrating the HD=3 case for the provided polynomial, completing the computation in microseconds. – Mark Adler Jul 17 '22 at 20:18
@MarkAdler - To clarify, checking for any 6 bit error failure in a 1024 bit message is comb(1024,6) ~= 1.578 x 10^15 combinations not feasible to test on my desktop. So I split the problem in half, creating a table of CRC versus 3 bit errors (3 one bits, 1021 zero bits), and the bit indexes, for comb(1024,3) = 178433024 combinations. Then I sorted the table and looked for duplicate CRCs, where the bit indexes did not intersect (3 bits + 3 bits = 6 bits). For 1024 bits, I got zero failures. For 1025 bits, I got some 6 bit failures and a few 4 bit failures, pattern sensitive. – rcgldr Jul 18 '22 at 17:26
@rcgldr Ah, understood. I didn't read your first comment carefully enough. – Mark Adler Jul 18 '22 at 19:39
@MarkAdler - which gets back to the O(k) complexity for k bit errors. Seems you need to test for comb(n,k) bit error patterns to check for a k bit CRC failure in an n bit message. I deleted the older comment, since my last one explains it better. – rcgldr Jul 18 '22 at 23:53
As I said, that O() was for calcuating a single CRC. Not the CRCs of all of the possible bit patterns. – Mark Adler Jul 19 '22 at 01:48

Computation of 64 bit CRC polynomial performance

1 Answers1