2

I have an example code for CRC16-CCITT algorithm written in C++ and I need help converting it to Python.

Example C++ code:

#include<iostream>

using namespace std;

unsigned short calculateCRC(unsigned char data[], unsigned int length)
{
        unsigned int i;
        unsigned short crc = 0;

        for(i=0; i<length; i++){
                crc = (unsigned char)(crc >>8) | (crc<<8);
                crc ^= data[i];
                crc ^= (unsigned char)(crc & 0xff) >> 4;
                crc ^= crc << 12;
                crc ^= (crc & 0x00ff) << 5;
        }

        return crc;
}

int main()
{
        unsigned int length;
        length = 15;

        unsigned char data[length] = {0x01,0x08,0x00,0x93,0x50,0x2e,0x42,0x83,0x3e,0xf1,0x3f,0x48,0xb5,0x04,0xbb};
        unsigned int crc;
        crc =  calculateCRC(data, length);
        cout<< std::hex << crc << '\n';
}

This code gives 9288 as output which is correct.

I tried the following in Python:

#!/usr/bin/env python3

def calculateCRC(data):
    crc = 0

    for dat in data:
        crc = (crc >> 8) or (crc << 8)
        crc ^= dat
        crc ^= (crc and 0xff) >> 4
        crc ^= crc << 12
        crc ^= (crc and 0x00ff) << 5
    crc = hex(crc)
    return (crc)


data = [0x01,0x08,0x00,0x93,0x50,0x2e,0x42,0x83,0x3e,0xf1,0x3f,0x48,0xb5,0x04,0xbb]
print(calculateCRC(data))

This outputs 0xf988334b0799be2081.

Could you please help me understand what I am doing wrong? Thank you.

beluga
  • 53
  • 6

1 Answers1

5

Python's int type is unbounded, but C / C++ unsigned short values are represented in 2 bytes so overflow when you shifted to the left. You need to add masking in Python to achieve the same effect, where you remove any bits higher than the 16th most-significant bit. This is only needed where values are shifted to the left as right-shifting already drops the right-most rotated bits.

Next, you are confusing the | and & bitwise operators with or and and boolean logical operators. The C++ code uses bitwise operators, use the same operators in Python.

Last but not least, leave conversion to hex to the caller, don't do this in the CRC function itself:

UNSIGNED_SHORT_MASK = 0xFFFF  # 2 bytes, 16 bits.

def calculateCRC(data):
    crc = 0
    for dat in data:
        crc = (crc >> 8) | (crc << 8 & UNSIGNED_SHORT_MASK)
        crc ^= dat
        crc ^= (crc & 0xff) >> 4
        crc ^= crc << 12 & UNSIGNED_SHORT_MASK
        crc ^= (crc & 0x00ff) << 5
    return crc

Now you get the same output:

>>> print(format(calculateCRC(data), '04x'))
9288

I used the format() function rather than hex() to create hex output without a 0x prefix.

As Mark Adler rightly points out, we don't need to mask for every left-shift operation; just because the C / C++ operations naturally would result in a masked value, doesn't mean we need to do this as often here. Masking once per iteration is enough:

def calculateCRC(data):
    crc = 0
    for dat in data:
        crc = (crc >> 8) | (crc << 8)
        crc ^= dat
        crc ^= (crc & 0xFF) >> 4
        crc ^= crc << 12
        crc ^= (crc & 0x00FF) << 5
        crc &= 0xFFFF
    return crc

There may be more short-cuts we could apply to shave off operations and so speed up operations, but if speed really is an issue, I'd re-implement this in Cython or C or another natively-compiled option, anyway.

Also note you can use a bytes object, you don't have to use a list of integers:

data = b'\x01\x08\x00\x93\x50\x2e\x42\x83\x3e\xf1\x3f\x48\xb5\x04\xbb'

Looping over a bytes object still gives you integers between 0 and 255, just like the char array in C++.

Finally, you don't actually have to translate the code yourself, you could just use an existing project like crccheck, which implements this specific CRC16 variant as well as many others:

>>> from crccheck.crc import CrcXmodem
>>> print(format(CrcXmodem.calc(data), '04x'))
9288

crccheck is written in pure Python. For native implementations, there is crcmod. This library's documentation is a little lacking, but it is also very flexible and powerful, and actually includes predefined functions:

>>> from crcmod.predefined import mkPredefinedCrcFun
>>> xmodem = mkPredefinedCrcFun('xmodem')
>>> print(format(xmodem(data), '04x'))
9288
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • You don't need to do the mask so often. You could do the mask just once in the loop and after exiting the loop. The bits above the low 16 don't actually affect the calculation, so in principle you could do it just once after the end of the loop. However, you should get rid of the high bits so you're not wasting time with the giant arbitrarily-long integer that Python would make. – Mark Adler Mar 26 '19 at 00:44
  • @MarkAdler. The code indeed is masking redundantly right now. But if we never mask in the loop, then the first right-shift operation would bring in higher bits (positions 16-23 shifted to 8-15), so we’d need to mask once after each iteration or mask crc in that first right-shift. I’d have to do some perf tests to see how much of a difference the two options would make with Python’s long int in that respect. – Martijn Pieters Mar 26 '19 at 08:44
  • I was referring to the original C code, where `crc = (unsigned char)(crc >>8) | (crc<<8);` is not affected by the bits above the low 16. – Mark Adler Mar 26 '19 at 11:58
  • @MarkAdler: ah, yes, I didn't pay attention to the cast to unsigned char there. – Martijn Pieters Mar 26 '19 at 12:22