13

Below, you will find a constexpr string literal to CRC32 computation.

I had to reinterpret the string literal character from char to unsigned char. Because reinterpret_cast is not available in constexpr function, the workaround is a small utility function to Two's complement manually but i am a little disappointed with it.

Does it exist a more elegant solution to deal with that kind of manipulation ?

#include <iostream>

class Crc32Gen {
    uint32_t m_[256] {};

    static constexpr unsigned char reinterpret_cast_schar_to_uchar( char v ) {
        return v>=0 ? v : ~(v-1);
    }
public:
    // algorithm from http://create.stephan-brumme.com/crc32/#sarwate
    constexpr Crc32Gen() {
        constexpr uint32_t polynomial = 0xEDB88320;
        for (unsigned int i = 0; i <= 0xFF; i++) { 
            uint32_t crc = i; 
            for (unsigned int j = 0; j < 8; j++) 
                crc = (crc >> 1) ^ (-int(crc & 1) & polynomial);
            m_[i] = crc;
        }
    }

    constexpr uint32_t operator()( const char* data ) const { 
        uint32_t crc = ~0; 
        while (auto c = reinterpret_cast_schar_to_uchar(*data++))
            crc = (crc >> 8) ^ m_[(crc & 0xFF) ^ c];
        return ~crc; 
    } 
};
constexpr Crc32Gen const crc32Gen_;

int main() {
    constexpr auto const val = crc32Gen_( "The character code for É is greater than 127" );
    std::cout << std::hex << val << std::endl;
}

Edit : in that case, static_cast<unsigned char>(*data++) is enough.

galop1n
  • 8,573
  • 22
  • 36

1 Answers1

9

Two's complement is not guaranteed by the standard; in clause 3.9.1:

7 - [...] The representations of integral types shall define values by use of a pure binary numeration system. [Example: this International Standard permits 2's complement, 1's complement and signed magnitude representations for integral types. — end example ]

So any code that assumes two's complement is going to have to perform the appropriate manipulations manually.

That said, your conversion function is unnecessary (and possibly incorrect); for signed-to-unsigned conversions you can just use the standard integral conversion (4.7):

2 - If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]

Corrected code, using static_cast::

constexpr uint32_t operator()( const char* data ) const { 
    uint32_t crc = ~0; 
    while (auto c = static_cast<unsigned char>(*data++))
        crc = (crc >> 8) ^ m_[(crc & 0xFF) ^ c];
    return ~crc; 
} 
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • An additional argument to find an alternative solution. Even if i am unlikely to deal with CPU not using the 2's complement representation. – galop1n Jan 31 '14 at 15:31
  • Thinking about it, i could just add 128 to the characters and safely index in `m_`. – galop1n Jan 31 '14 at 15:36
  • 1
    @galop1n you can just cast from `char` to `unsigned char` - see above. – ecatmur Jan 31 '14 at 15:37
  • 5
    @galop1n Adding 128 assumes that `char` is a signed 8-bit two's-complement type, all properties that the standard does not guarantee. – Casey Jan 31 '14 at 15:39
  • @Casey According to 5.3.3, "sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1". But yes, still thinking with two's complement in mind. – galop1n Jan 31 '14 at 15:46
  • 4
    @galop1n but `CHAR_BIT` can be greater than 8; it's only required to be *at least* 8 (C, 5.2.4.2.1, referenced from 3.9.1p3). – ecatmur Jan 31 '14 at 15:53
  • 1
    @galop1n: Also, there is no telling whether `char` is signed or not. On gcc there actually is a flag to control this property. If you were to add 128 and it was unsigned to start with, you would just overflow for sufficiently large values (and wrap). – Matthieu M. Jan 31 '14 at 18:45