Why does the 0x55555556 divide by 3 hack work?

Question

There is a (relatively) well known hack for dividing a 32-bit number by three. Instead of using actual expensive division, the number can be multiplied by the magic number 0x55555556, and the upper 32 bits of the result are what we're looking for. For example, the following C code:

int32_t div3(int32_t x)
{
    return x / 3;
}

compiled with GCC and -O2, results in this:

08048460 <div3>:
 8048460:   8b 4c 24 04             mov    ecx,DWORD PTR [esp+0x4]
 8048464:   ba 56 55 55 55          mov    edx,0x55555556
 8048469:   89 c8                   mov    eax,ecx
 804846b:   c1 f9 1f                sar    ecx,0x1f
 804846e:   f7 ea                   imul   edx
 8048470:   89 d0                   mov    eax,edx
 8048472:   29 c8                   sub    eax,ecx
 8048474:   c3                      ret

I'm guessing the sub instruction is responsible for fixing negative numbers, because what it does is essentially add 1 if the argument is negative, and it's a NOP otherwise.

But why does this work? I've been trying to manually multiply smaller numbers by a 1-byte version of this mask, but I fail to see a pattern, and I can't really find any explanations anywhere. It seems to be a mystery magic number whose origin isn't clear to anyone, just like 0x5f3759df.

Can someone provide an explanation of the arithmetic behind this?

Possible duplicate of [Faster integer division when denominator is known?](http://stackoverflow.com/questions/2616072/faster-integer-division-when-denominator-is-known) — Peter O., Mar 16 '16 at 17:14
@PeterO. Please show me where in that question (or answers) the specific algorithm I outlined above is explained. — user4520, Mar 16 '16 at 18:01

Mark Ransom · Accepted Answer · 2016-03-16T15:11:51.247

15

It's because 0x55555556 is really 0x100000000 / 3, rounded up.

The rounding is important. Since 0x100000000 doesn't divide evenly by 3, there will be an error in the full 64-bit result. If that error were negative, the result after truncation of the lower 32 bits would be too low. By rounding up, the error is positive, and it's all in the lower 32 bits so the truncation wipes it out.

edited Mar 16 '16 at 15:11

answered Mar 16 '16 at 15:07

Mark Ransom

299,747
42
398
622

1

I don't get it. Can you explain further? – Dmytro Marchuk Mar 16 '16 at 15:10
5

@DmitryMarchuk multiplying by `0x100000000` is the same as shifting left by 32 bits. So you're effectively shifting left, then dividing, all in one operation. You then shift right (i.e. take the upper 32 bits) to get the final result. – Mark Ransom Mar 16 '16 at 15:13
1

Also see http://stackoverflow.com/a/2616214/404501 (Integer division by multiplication and shifting) – Markus Kull Mar 16 '16 at 15:22
Could you elaborate on the rounding up vs down problem? "If that error were negative, the result after truncation of the lower 32 bits would be too low. By rounding up, the error is positive, and it's all in the lower 32 bits so the truncation wipes it out." - and how do we know the upper 32 bits won't contain a value greater than the actual result if we round up? – user4520 Mar 16 '16 at 19:32
4

@szczurcio we know that the error in the multiplier is 2/3, because that's how much we added to round it up. The error in the multiplication result will be between `0*2/3` (i.e. 0) and `0xffffffff*2/3` (i.e. 0xaaaaaaab). Since 0xaaaaaaab is less than 0x100000000, we know it won't overflow into the upper bits. I should have mentioned that this only works for positive numbers, the GCC compiler writers have obviously refined what I have here. – Mark Ransom Mar 16 '16 at 20:37

Why does the 0x55555556 divide by 3 hack work?

1 Answers1