11

I have come across this CRC32 code and was curious why the author would choose to use

crc = crc ^ ~0U;

instead of

crc = ~crc;

As far as I can tell, they are equivalent.

I have even disassembled the two versions in Visual Studio 2010.

Not optimized build:

    crc = crc ^ ~0U;
009D13F4  mov         eax,dword ptr [crc]  
009D13F7  xor         eax,0FFFFFFFFh  
009D13FA  mov         dword ptr [crc],eax 

    crc = ~crc;
011C13F4  mov         eax,dword ptr [crc]  
011C13F7  not         eax  
011C13F9  mov         dword ptr [crc],eax  

I also cannot justify the code by thinking about the number of cycles that each instruction takes since both should be taking 1 cycle to complete. In fact, the xor might have a penalty by having to load the literal from somewhere, though I am not certain of this.

So I'm left thinking that it is possibly just a preferred way to describe the algorithm, rather than an optimization... Would that be correct?

Edit 1:

Since I just realized that the type of the crc variable is probably important to mention I am including the whole code (less the lookup table, way too big) here so you don't have to follow the link.

uint32_t crc32(uint32_t crc, const void *buf, size_t size)
{
    const uint8_t *p;

    p = buf;
    crc = crc ^ ~0U;

    while (size--)
    {
        crc = crc32_tab[(crc ^ *p++) & 0xFF] ^ (crc >> 8);
    }

    return crc ^ ~0U;
}

Edit 2:

Since someone has brought up the fact that an optimized build would be of interest, I have made one and included it below.

Optimized build:

Do note that the whole function (included in the last edit below) was inlined.

// crc = crc ^ ~0U;
    zeroCrc = 0;
    zeroCrc = crc32(zeroCrc, zeroBufferSmall, sizeof(zeroBufferSmall));
00971148  mov         ecx,14h  
0097114D  lea         edx,[ebp-40h]  
00971150  or          eax,0FFFFFFFFh  
00971153  movzx       esi,byte ptr [edx]  
00971156  xor         esi,eax  
00971158  and         esi,0FFh  
0097115E  shr         eax,8  
00971161  xor         eax,dword ptr ___defaultmatherr+4 (973018h)[esi*4]  
00971168  add         edx,ebx  
0097116A  sub         ecx,ebx  
0097116C  jne         main+153h (971153h)  
0097116E  not         eax  
00971170  mov         ebx,eax  

// crc = ~crc;
    zeroCrc = 0;
    zeroCrc = crc32(zeroCrc, zeroBufferSmall, sizeof(zeroBufferSmall));
01251148  mov         ecx,14h  
0125114D  lea         edx,[ebp-40h]  
01251150  or          eax,0FFFFFFFFh  
01251153  movzx       esi,byte ptr [edx]  
01251156  xor         esi,eax  
01251158  and         esi,0FFh  
0125115E  shr         eax,8  
01251161  xor         eax,dword ptr ___defaultmatherr+4 (1253018h)[esi*4]  
01251168  add         edx,ebx  
0125116A  sub         ecx,ebx  
0125116C  jne         main+153h (1251153h)  
0125116E  not         eax  
01251170  mov         ebx,eax  
nonsensickle
  • 4,438
  • 2
  • 34
  • 61
  • 4
    Would you mind to explain it, without @nonensickle searching for compilers? – deviantfan Mar 11 '14 at 20:36
  • 3
    C is a portable language. Compiling it to one particular instruction set is not a useful way to argue about it. – Kerrek SB Mar 11 '14 at 20:40
  • Could it have something to do with the fact that some architectures don't have an exact bitwise not? (Eg. MIPS) Maybe the author wanted to give it in terms of xor so they didn't have to rely on however the compiler decided to emulate not. xor is more universal, so they could have preferred it to make the code more performance friendly to being ported. – Tyler Mar 11 '14 at 20:41
  • Also since I got a bit of time, they do produce different operations. While in this case I doubt it was the reason, in terms of generating shellcode one is at huge advantage because it's shellcode will be much smaller. You can play with gdb to see which one. – Tymoteusz Paul Mar 11 '14 at 20:43
  • 3
    Since your disassembled code is written for x86, it is worth pointing out that `XOR` will set/clear the Zero Flag whereas `NOT` will not (sometimes useful if you want to perform a bitwise operation without affecting jump conditions that rely on flags from previous operations). Now, considering you're not writing assembly directly, you really have no access to this flag in a meaningful way so I doubt this is the reason for favoring one over the other. – Andon M. Coleman Mar 11 '14 at 20:47
  • 1
    Did you enable optimizations when compiling it? I do not think it should write eax back to [crc] in an optimized build. – BeniBela Mar 11 '14 at 20:57
  • @AndonM.Coleman That is interesting. I wonder what would happen in the case of writing `if (~crc)`, where it would be more optimal for the compiler to use the *xor* with `0xFF..FF` version that updates the zero flag. – nonsensickle Mar 11 '14 at 21:32
  • @BeniBela No this is not an optimized build. I will include that in the question. – nonsensickle Mar 11 '14 at 21:32
  • @AndonM.Coleman Your reply got me curious enough to try and document it in a question of its own. I agree that my question may not be well formulated but it is worth documenting the difference and compiler behavior. http://stackoverflow.com/questions/22367511/bitwise-operators-not-vs-xor-use-in-branching – nonsensickle Mar 13 '14 at 01:47

5 Answers5

10

Something nobody's mentioned yet; if this code is being compiled on a machine with 16 bit unsigned int then these two code snippets are different.

crc is specified as a 32-bit unsigned integral type. ~crc will invert all bits, but if unsigned int is 16bit then crc = crc ^ ~0U will only invert the lower 16 bits.

I don't know enough about the CRC algorithm to know whether this is intentional or a bug, perhaps hivert can clarify; although looking at the sample code posted by OP, it certainly does make a difference to the loop that follows.

NB. Sorry for posting this as an "answer" because it isn't an answer, but it's too big to just fit in a comment :)

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Not the correct answer, but a really good point nonetheless. +1 – Anthony Mar 12 '14 at 02:27
  • I do like your point, but `crc` is an `uint32_t` which is not defined if the `unsigned int` is not 32 bits in size and there isn't an alternative `unsigned` type to replace it. – nonsensickle Mar 12 '14 at 03:05
  • 1
    `unsigned long` could be `uint32_t` – M.M Mar 12 '14 at 03:12
  • You are completely correct, and as written the code has a bug, _if_ compiled on a machine with a 16-bit unsigned type, permitted by the C standard. To get the correct CRC, it _must_ be exclusive-ored with `0xffffffff`. – Mark Adler Mar 12 '14 at 04:08
  • 2
    This _is_ an answer, and in fact it is the correct answer. The `^ ~0U` is a portability bug. With the `uint32_t`, a `~` should be used and would be portable. If, say, an `unsigned long` had been used, which per the C standard must be at least 32 bits, then `~` would _also_ be a portability bug, in the event it were compiled on a platform with 64-bit longs. Then the only correct way to do it would be `^ 0xffffffff`. That is the most portable approach. – Mark Adler Mar 12 '14 at 04:12
  • What you said means that the code wont work on a 16bit machine. Concerning the `~` version. It works on a 64 bits machine. I can tell you, I tested it http://coliru.stacked-crooked.com/a/013fb7f0af0c5195 – hivert Mar 12 '14 at 07:32
  • @hivert, two things: 1. `crc = !crc;` should be `crc = ~crc;`; 2. `std::cout << crc32(25, s.c_str(), s.size()) << " " << crc32(25, s.c_str(), s.size()) << std::endl;` should probably call `crc321` in the second case... – nonsensickle Mar 12 '14 at 21:58
  • 1
    @MarkAdler Yes, you are correct. It seems that I misread the answer initially. Now that I understand that `unsigned int` size refers to the `0U` it is clear that it is not portable code. Funny that I found it on Apple's site... I'm still convinced that the answer currently marked as correct also plays a role but I will have to change correct answer choice. – nonsensickle Mar 12 '14 at 22:02
  • Where on Apple's site? – Mark Adler Mar 12 '14 at 22:48
  • There's a link in the question. http://www.opensource.apple.com/source/xnu/xnu-1456.1.26/bsd/libkern/crc32.c – nonsensickle Mar 12 '14 at 22:57
  • I don't like having `0xffffffff` in my code, would `crc = crc ^ ((1 << (sizeof(crc) * 8)) - 1)` be a good way to guarantee that what ever size `crc` takes on we will have the correct number of 1's? (there could be an overflow problem without casting here, but something similar)... – nonsensickle Mar 13 '14 at 01:53
  • Well this is moot since you can and should use `~crc`; but `~0UL` also works since `unsigned long` is guaranteed to be at least 32bit. Another equivalent version would be `(uint32_t)-1` which is guaranteed to be all-bits-one. Your expression has the problem that you would shift the `1` off the end before you get up to doing the subtraction, which is undefined behaviour since `1` is a signed int. – M.M Mar 13 '14 at 02:25
  • As I said, if you are using `uint32_t`, then `~crc` is portable on all platforms (that have `uint32_t`). I don't know why you're allergic to `^ 0xffffffff`. That is the most explicit and clear way to express what needs to be done to the CRC. It also happens to be the most portable, in the event that someone takes the code and changes the type (e.g. because they don't have `uint32_t`). – Mark Adler Mar 13 '14 at 02:39
  • Thanks -- missed the link. – Mark Adler Mar 13 '14 at 02:40
  • `^ ~0UL` is also not portable if used with `unsigned long` where that is 64 bits. – Mark Adler Mar 13 '14 at 02:43
  • That's fine though as the result will be truncated to fit in the `uint32_t` it's being assigned to. (This is well-defined since we're working with unsigned types) – M.M Mar 13 '14 at 03:16
7

The short answer is: Because it allows to have an uniform algorithm for all CRC's

The reason is the following: There is a lot of variant of CRC. Each one depend on a Z/Z2 polynomial which is used for an euclidian division. Usually is it implemented using the algorithm described In this paper by Aram Perez. Now depending on the polynomial you are using, there is a final XOR at the end of the algorithm which depend on the polynomial whose goal is to eliminate some corner case. It happens that for CRC32 this is the same as a global not but this is not true for all CRC. As an evidence on This web page you can read (emphasis mine):

Consider a message that begins with some number of zero bits. The remainder will never contain anything other than zero until the first one in the message is shifted into it. That's a dangerous situation, since packets beginning with one or more zeros may be completely legitimate and a dropped or added zero would not be noticed by the CRC. (In some applications, even a packet of all zeros may be legitimate!) The simple way to eliminate this weakness is to start with a nonzero remainder. The parameter called initial remainder tells you what value to use for a particular CRC standard. And only one small change is required to the crcSlow() and crcFast() functions:

crc remainder = INITIAL_REMAINDER;

The final XOR value exists for a similar reason. To implement this capability, simply change the value that's returned by crcSlow() and crcFast() as follows:

return (remainder ^ FINAL_XOR_VALUE);

If the final XOR value consists of all ones (as it does in the CRC-32 standard), this extra step will have the same effect as complementing the final remainder. However, implementing it this way allows any possible value to be used in your specific application.

Community
  • 1
  • 1
hivert
  • 10,579
  • 3
  • 31
  • 56
  • Technically `return !digital_update_crc32(0xffffffff, buf, len);` should probably be `return ~digital_update_crc32(0xffffffff, buf, len);` but I know what you mean... – nonsensickle Mar 11 '14 at 21:18
  • @nonsensickle : Sorry I completely messed with your comment. – hivert Mar 11 '14 at 21:22
  • That explains why the algorithm is expressed that way, and hence probably why the code is written that way (as several of us suggested). But this function only implements one specific CRC, not the general form. – Alan Stokes Mar 11 '14 at 21:31
  • @hivert So to be clear, you are saying *"it is possibly just a preferred way to describe the algorithm, rather than an optimization"* is correct? – Radiodef Mar 11 '14 at 21:35
  • I agree with your conclusion. Marking as accepted since it does have the most backing evidence. Thanks, the links were very informative. – nonsensickle Mar 11 '14 at 21:51
  • 1
    This does not answer the question at all! (Yet it was accepted.) It answers an entirely different question, which is _why_ do CRC implementations usually pre and post process the CRC (typically with an inversion of the CRC bits). The question here however is about _how_ that inversion is written in this particular code. The correct answer is that the `^ ~0U` is a mistake if the code is to be portable. – Mark Adler Mar 12 '14 at 04:06
  • 2
    I think OP's question was more about why one form would be preferred to the other on the 32-bit system (not realizing that they were not equivalent), when the `^ ~0U` version is perhaps less intuitive than the bit complement version. If you fix it to be `~0UL` then my objection disappears and OP's question remains. – M.M Mar 12 '14 at 08:53
  • OP question is "So I'm left thinking that it is possibly just a preferred way to describe the algorithm, rather than an optimization... Would that be correct?". My answer clearly indicate that this is a "Yes". – hivert Mar 12 '14 at 09:06
  • @hivert You too are correct, I did not spot the portability issue here and am now left not knowing who to attribute the correct answer to. Your answer, answers my original question about why this form is preferred, but the portability issue is also of interest. It is a genuine mistake in the code. I will think on it for a few hours and come back with a choice of which answer to mark as correct. Sorry about the confusion. – nonsensickle Mar 12 '14 at 23:01
1

Just to add my own guess to the mix, x ^ 0x0001 keeps the last bit and flipps the others; to turn off the last bit use x & 0xFFFE or x & ~0x0001; to turn on the last bit unconditionally use x | 0x0001. I.e., if you are doing lots of bit-twiddling, your fingers probably know those idioms and just roll them out without much thinking.

vonbrand
  • 11,412
  • 8
  • 32
  • 52
  • I think that this plays a part in it but I suspect that @hivert is correct in saying that it is just a specialization of a more generic algorithm. – nonsensickle Mar 11 '14 at 21:40
0

I doubt there's any deep reason. Maybe that's how the author thought about it ("I'll just xor with all ones"), or perhaps how it was expressed in the algorithm definition.

Alan Stokes
  • 18,815
  • 3
  • 45
  • 64
  • I wouldn't be so sure that there isn't a deep reason without testing as compilers came a very long way last 20 years. – Tymoteusz Paul Mar 11 '14 at 20:47
  • 1
    @Puciek I can imagine a compiler producing worse code for the xor than the not, but vice versa would be very odd. Also I'd want some evidence the original author was micro-optimizing. – Alan Stokes Mar 11 '14 at 20:52
  • Well I didn't downvote becuase I do not have evidence either way, just as you. I am merely pointing out the fact that it may be optimization or even a work-around. – Tymoteusz Paul Mar 11 '14 at 20:55
0

I think it is for the same reason that some write

const int zero = 0;

and others write

const int zero = 0x00000000;

Different people think different ways. Even about a fundamental operation.

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • Thought I understand your point I think that the example may be a little contrived. I haven't seen anyone write the latter, though there is still time for me to find an instance of its use. :) – nonsensickle Mar 11 '14 at 21:49
  • 2
    People do often write `'\0'` instead of the exactly equivalent `0`. – M.M Mar 12 '14 at 02:12
  • 1
    @nonsensickle: You will find lots of the latter example in code written by hardware engineers and their descendants. – wallyk Mar 12 '14 at 15:01