18

I want to create a 32-bit number out of an ASCII-string. CRC32 algorithm is exactly what I'm looking for, but I can't use it because the table it requires is way too huge (it is for an embedded system where resources are VERY rare).

So: any suggestions for a fast and slim CRC algorithm? It does not matter when collisions are a bit more probable than with the original CRC32.

Wolf
  • 9,679
  • 7
  • 62
  • 108
Elmi
  • 5,899
  • 15
  • 72
  • 143
  • 8
    CRC32 can be implemented with no lookup table, or with a 1k-byte lookup table if you must, without a major speed penalty compared to the 256k lookup table variant. Example at http://wiki.osdev.org/CRC32. If you really must save bytes, use adler32. – dascandy Jan 14 '15 at 09:47
  • 4
    What you mean with `ressources are VERY rare`? Less than 64MB, less than 8KB or less than 512byte? – jeb Jan 14 '15 at 09:49
  • 1
    jeb: VERY rare means I currenty do not have enough space left to add the table as shown in http://www.opensource.apple.com/source/xnu/xnu-1456.1.26/bsd/libkern/crc32.c – Elmi Jan 14 '15 at 09:52
  • 2
    Maybe just fix that code and put table in flash. Most linkers put constant variables in flash, and this days even low end CPU-s comes with descent amount of OTP. Just define table to be const. – Luka Rahne Jan 14 '15 at 09:54
  • 2
    If you don't have any particular requirements for the quality of the hash/checksum/whatever, something very simple like [`boost::hash_combine`](http://www.boost.org/doc/libs/1_35_0/doc/html/boost/hash_combine_id241013.html), or even just XOR, might be good enough. – Mike Seymour Jan 14 '15 at 09:59
  • Link for the source code of the fast crc algorithm -> http://ideone.com/05tIaE – Irrational Person Jan 14 '15 at 10:07
  • Does it need to be a CRC? You could use a Reed Solomon like reversible polynomial with 8 bit coefficients with the pattern 1 x y x 1, where x and y are 8 bit values, each requiring a 256 byte lookup table for encoding and/or re-encoding to check for errors. If the size of the data to be encoded / re-encoded is known before encoding starts, the encode / re-encode loop can be unfolded, using switch / case or goto's for the initial jump into the loop. – rcgldr Jan 14 '15 at 17:07
  • 2
    The question is not off topic. The stackoverflow police apparently don't know the difference between an algorithm and an implementation. It is entirely on topic here to ask for what algorithms exist to do a particular task. – Mark Adler Jan 22 '17 at 17:00
  • @MarkAdler bit it's asking for a "fast algorithm" instead -- which is unanswerable due to unknown requirements/hardware/software and endless possible optimizations. – ivan_pozdeev Jan 23 '17 at 01:56
  • First off, that's absurd. Of course you can talk about algorithms that are faster or slower independent of the hardware. There are several volumes by Knuth that talk about nothing but that (and memory requirements). And the requirement is simply to compute a CRC. Second, despite the title, the body of the question was actually asking for slim, not fast. You can also talk about the code size of algorithms independent of the hardware. – Mark Adler Jan 23 '17 at 03:37
  • The fastest, slimmest way to do this is to call an intrinsic. ARM and Intel have these ready to go on their more modern CPUs. If you don't have this, then resort to hard coding it. – Michael Dorgan May 15 '18 at 17:08

2 Answers2

36

CRC implementations use tables for speed. They are not required.

Here is a short CRC32 using either the Castagnoli polynomial (same one as used by the Intel crc32 instruction), or the Ethernet polynomial (same one as used in zip, gzip, etc.).

#include <stddef.h>
#include <stdint.h>

/* CRC-32C (iSCSI) polynomial in reversed bit order. */
#define POLY 0x82f63b78

/* CRC-32 (Ethernet, ZIP, etc.) polynomial in reversed bit order. */
/* #define POLY 0xedb88320 */

uint32_t crc32c(uint32_t crc, const unsigned char *buf, size_t len)
{
    int k;

    crc = ~crc;
    while (len--) {
        crc ^= *buf++;
        for (k = 0; k < 8; k++)
            crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    }
    return ~crc;
}

The initial crc value should be zero. The routine can be called successively with chunks of the data to update the CRC. You can unroll the inner loop for speed, though your compiler might do that for you anyway.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • 2
    `crc = (crc >> 1) ^ (POLY & (0 - (crc & 1)));` is ~33% faster when compiled for x86 with GCC/ICC/CLANG (104 MB/s vs. 134 MB/s). On the same system the slice-8 is 2040 MB/s and Intel CRC32C is 8 GB/s so maybe it doesn't make much difference at the end. – t0rakka Feb 13 '18 at 14:37
  • 1
    @SnappleLVR Faster still would be a byte-wise or word-wise table implementation. See [crcany](https://github.com/madler/crcany), which will generate such routines. – Mark Adler Feb 13 '18 at 18:03
  • 1
    Benchmarked with same machine as above results: 1430 MB/s, not bad, but slice-8 is still 30% faster and the CRC32C instructions are 450% faster. I was just pointing out that the simple bitwise version can be optimized to be 33% faster with a small surgical modification to the source. The original code compiled into "test/cmov" with CLANG and GCC using -O3. The faster variant uses straightforward NEG/XOR/AND as can be expected from the source. The (0 - n) is just trick to allow negation on unsigned integers. So to be clear I didn't claim it was fastest possible routine. xD – t0rakka Feb 14 '18 at 09:22
3

Obviously the biggest lookup table will bring the best performance, but you can use any (smaller) table for 16,8 or 4bit lookups.

So the table sizes are for crc32:

16bit-lookup: 4*2^16=256k  
 8bit-lookup: 4*2^8=1k  
 4bit-lookup: 4*2^4=64byte  

The 4bit table is four times slower than the 16bit table.
What you should use depends on your speed requirements.

As Luka Rahne mentions it's a good idea to put a table to the flash memory, but on many platforms it's not enough to use the const keyword.
Most often you need to place the table into a section placed in flash, by modifying your linker command file.

jeb
  • 78,592
  • 17
  • 171
  • 225
  • See also [*Fast CRC32*](https://create.stephan-brumme.com/crc32/), a web page by Stephan Brumme that provides a comparison with varying table sizes. – Wolf Aug 26 '21 at 09:10