How are numbers greater than 2^32 handled by a 32 bit machine?

Question

I am trying to understand how calculations involving numbers greater than 2³² happen on a 32 bit machine.

C code

$ cat size.c
#include<stdio.h>
#include<math.h>

int main() {

    printf ("max unsigned long long = %llu\n",
    (unsigned long long)(pow(2, 64) - 1));
}
$

gcc output

$ gcc size.c -o size
$ ./size
max unsigned long long = 18446744073709551615
$

Corresponding assembly code

$ gcc -S size.c -O3
$ cat size.s
    .file   "size.c"
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align 4
.LC0:
    .string "max unsigned long long = %llu\n"
    .text
    .p2align 4,,15
.globl main
    .type   main, @function
main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $16, %esp
    movl    $-1, 8(%esp)   #1
    movl    $-1, 12(%esp)  #2
    movl    $.LC0, 4(%esp) #3
    movl    $1, (%esp)     #4
    call    __printf_chk
    leave
    ret
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
    .section    .note.GNU-stack,"",@progbits
$

What exactly happens on the lines 1 - 4?

Is this some kind of string concatenation at the assembly level?

Using `pow` is a very very nasty and error-prone way to compute integer powers of 2. Don't use it. Note that if your floating point subtraction didn't get done as an 80-bit `long double`, `pow(2,64)-1` would be equal to `pow(2,64)`, and then casting it into `unsigned long long` would not work right. — R.. GitHub STOP HELPING ICE, Oct 09 '10 at 22:25

Matthew Slattery · Accepted Answer · 2010-10-09T23:20:13.007

__printf_chk is a wrapper around printf which checks for stack overflow, and takes an additional first parameter, a flag (e.g. see here.)

pow(2, 64) - 1 has been optimised to 0xffffffffffffffff as the arguments are constants.

As per the usual calling conventions, the first argument to __printf_chk() (int flag) is a 32-bit value on the stack (at %esp at the time of the call instruction). The next argument, const char * format, is a 32-bit pointer (the next 32-bit word on the stack, i.e. at %esp+4). And the 64-bit quantity that is being printed occupies the next two 32-bit words (at %esp+8 and %esp+12):

pushl   %ebp                 ; prologue
movl    %esp, %ebp           ; prologue
andl    $-16, %esp           ; align stack pointer
subl    $16, %esp            ; reserve bytes for stack frame
movl    $-1, 8(%esp)   #1    ; store low half of 64-bit argument (a constant) to stack
movl    $-1, 12(%esp)  #2    ; store high half of 64-bit argument (a constant) to stack
movl    $.LC0, 4(%esp) #3    ; store address of format string to stack
movl    $1, (%esp)     #4    ; store "flag" argument to __printf_chk to stack
call    __printf_chk         ; call routine
leave                        ; epilogue
ret                          ; epilogue

The compiler has effectively rewritten this:

printf("max unsigned long long = %llu\n", (unsigned long long)(pow(2, 64) - 1));

...into this:

__printf_chk(1, "max unsigned long long = %llu\n", 0xffffffffffffffffULL);

...and, at runtime, the stack layout for the call looks like this (showing the stack as 32-bit words, with addresses increasing from the bottom of the diagram upwards):

        :                 :
        :     Stack       :
        :                 :
        +-----------------+
%esp+12 |      0xffffffff | \ 
        +-----------------+  } <-------------------------------------.
%esp+8  |      0xffffffff | /                                        |
        +-----------------+                                          |
%esp+4  |address of string| <---------------.                        |
        +-----------------+                 |                        |
%esp    |               1 | <--.            |                        |
        +-----------------+    |            |                        |
                  __printf_chk(1, "max unsigned long long = %llu\n", |
                                                    0xffffffffffffffffULL);

score 6 · Answer 2 · answered Oct 10 '10 at 14:57

6

similar to the way as we handle numbers greater than 9, with only digits 0 - 9. (using positional digits). presuming the question is a conceptual one.

answered Oct 10 '10 at 14:57

alvin

1,176
8
15

great insight, never thought about it that way! – Lazer Oct 10 '10 at 15:07

score 3 · Answer 3 · answered Oct 09 '10 at 21:36

In your case, the compiler knows that 2^64-1 is just 0xffffffffffffffff, so it has pushed -1 (low dword) and -1 (high dword) onto the stack as your argument to printf. It's just an optimization.

In general, 64-bit numbers (and even greater values) can be stored with multiple words, e.g. an unsigned long long uses two dwords. To add two 64-bit numbers, two additions are performed - one on the low 32 bits, and one on the high 32 bits, plus the carry:

; Add 64-bit number from esi onto edi:
mov     eax, [esi] ; get low 32 bits of source
add     [edi], eax ; add to low 32 bits of destination
; That add may have overflowed, and if it did, carry flag = 1.
mov     eax, [esi+4] ; get high 32 bits of source
adc     [edi+4], eax ; add to high 32 bits of destination, then add carry.

You can repeat this sequence of add and adcs as much as you like to add arbitrarily big numbers. The same thing can be done with subtraction - just use sub and sbb (subtract with borrow).

Multiplication and division are much more complicated, and the compiler usually produces some small helper functions to deal with these whenever you multiply 64-bit numbers together. Packages like GMP which support very, very large integers use SSE/SSE2 to speed things up. Take a look at this Wikipedia article for more information on multiplication algorithms.

plugwash · Answer 4 · 2016-07-28T11:45:30.150

As others have pointed out all 64-bit aritmetic in your example has been optimised away. This answer focuses on the question int the title.

Basically we treat each 32-bit number as a digit and work in base 4294967296. In this manner we can work on arbiterally big numbers.

Addition and subtraction are easiest. We work through the digits one at a time starting from the least significant and moving to the most significant. Generally the first digit is done with a normal add/subtract instruction and later digits are done using a specific "add with carry" or "subtract with borrow" instruction. The carry flag in the status register is used to take the carry/borrow bit from one digit to the next. Thanks to twos complement signed and unsigned addition and subtraction are the same.

Multiplication is a little trickier, multiplying two 32-bit digits can produce a 64-bit result. Most 32-bit processors will have instructions that multiply two 32-bit numbers and produces a 64-bit result in two registers. Addition will then be needed to combine the results into a final answer. Thanks to twos complement signed and unsigned multiplication are the same provided the desired result size is the same as the argument size. If the result is larger than the arguments then special care is needed.

For comparision we start from the most significant digit. If it's equal we move down to the next digit until the results are equal.

Division is too complex for me to describe in this post, but there are plenty of examples out there of algorithms. e.g. http://www.hackersdelight.org/hdcodetxt/divDouble.c.txt

Some real-world examples from gcc https://godbolt.org/g/NclqXC , the assembler is in intel syntax.

First an addition. adding two 64-bit numbers and producing a 64-bit result. The asm is the same for both signed and unsigned versions.

int64_t add64(int64_t a, int64_t b) { return a +  b; }
add64:
    mov     eax, DWORD PTR [esp+12]
    mov     edx, DWORD PTR [esp+16]
    add     eax, DWORD PTR [esp+4]
    adc     edx, DWORD PTR [esp+8]
    ret

This is pretty simple, load one argument into eax and edx, then add the other using an add followed by an add with carry. The result is left in eax and edx for return to the caller.

Now a multiplication of two 64-bit numbers to produce a 64-bit result. Again the code doesn't change from signed to unsigned. I've added some comments to make it easier to follow.

Before we look at the code lets consider the math. a and b are 64-bit numbers I will use lo() to represent the lower 32-bits of a 64-bit number and hi() to represent the upper 32 bits of a 64-bit number.

(a * b) = (lo(a) * lo(b)) + (hi(a) * lo(b) * 2^32) + (hi(b) * lo(a) * 2^32) + (hi(b) * hi(a) * 2^64)

(a * b) mod 2^64 = (lo(a) * lo(b)) + (lo(hi(a) * lo(b)) * 2^32) + (lo(hi(b) * lo(a)) * 2^32)

lo((a * b) mod 2^64) = lo(lo(a) * lo(b))

hi((a * b) mod 2^64) = hi(lo(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))

uint64_t mul64(uint64_t a, uint64_t b) { return a*b; }
mul64:
    push    ebx                      ;save ebx
    mov     eax, DWORD PTR [esp+8]   ;load lo(a) into eax
    mov     ebx, DWORD PTR [esp+16]  ;load lo(b) into ebx
    mov     ecx, DWORD PTR [esp+12]  ;load hi(a) into ecx  
    mov     edx, DWORD PTR [esp+20]  ;load hi(b) into edx
    imul    ecx, ebx                 ;ecx = lo(hi(a) * lo(b))
    imul    edx, eax                 ;edx = lo(hi(b) * lo(a))
    add     ecx, edx                 ;ecx = lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
    mul     ebx                      ;eax = lo(low(a) * lo(b))
                                     ;edx = hi(low(a) * lo(b))
    pop     ebx                      ;restore ebx.
    add     edx, ecx                 ;edx = hi(low(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
    ret

Finally when we try a division we see.

int64_t div64(int64_t a, int64_t b) { return a/b; }
div64:
    sub     esp, 12
    push    DWORD PTR [esp+28]
    push    DWORD PTR [esp+28]
    push    DWORD PTR [esp+28]
    push    DWORD PTR [esp+28]
    call    __divdi3
    add     esp, 28
    ret

The compiler has decided that division is too complex to implement inline and instead calls a library routine.

A real example might help: [add and mul of `int64_t` compiled with -m32, on the Godbolt compiler explorer](http://gcc.godbolt.org/#compilers:!((compiler:g530,options:'-xc+-O3+-std%3Dgnu11+-Wall+-pedantic+-Wextra+-mtune%3Dhaswell+-m32',source:'%23include+%3Cstdint.h%3E%0A%0Aint64_t+add64(int64_t+a,+int64_t+b)+%7B+return+a%2Bb%3B+%7D%0A%0Aint64_t+mul64(int64_t+a,+int64_t+b)+%7B+return+a*b%3B+%7D%0A')),filterAsm:(commentOnly:!t,directives:!t,intel:!t,labels:!t),version:3). Feel free to incorporate that code, asm output, and the godbolt link into your answer. — Peter Cordes, Jul 28 '16 at 04:16
Also, "digit" is maybe a confusing term, since most people will keep thinking "decimal digit". [GMP calls them "limbs"](https://gmplib.org/). — Peter Cordes, Jul 28 '16 at 04:17
Always prefer non-shortened links when there's room, because they can't rot. http://meta.stackoverflow.com/a/319594/224132. Nice addition describing how the extended precision asm works. I usually write something like `this compiles ([on the Godbolt compiler explorer][1])...`. Use ctrl-L to create hyperlinks in SO's markdown editor. You don't need to have the URLs visible as part of the text of the answer (and often shouldn't, to reduce clutter). — Peter Cordes, Jul 28 '16 at 14:30
Unfortunately the non-shortened links generated by godbolt seem to be broken. When I followed your link a plus sign was missing from the code. Adding the plus sign and regenerating the long link resulted in the same problem. — plugwash, Jul 28 '16 at 14:57
That doesn't happen for me. My link works fine. What browser are you using? You should maybe report an issue on [the issue tracker](https://github.com/mattgodbolt/gcc-explorer/issues). — Peter Cordes, Jul 28 '16 at 15:07

score 1 · Answer 5 · answered Oct 09 '10 at 21:20

1

The compiler actually made a static optimization of your code. lines #1 #2 #3 are parameters for printf()

answered Oct 09 '10 at 21:20

Nicolas Viennot

3,921
1
21
22

Yes, I understand that. I want to know how exactly `printf` is prepared at these lines 1 - 4. – Lazer Oct 09 '10 at 21:33
`subl $16, %esp` makes some room for 4 arguments of 32bits. Line #1 and #2 moves the 64bits argument (so 2 32bits). Line #3 set the format string, and then "1" is pushed as an argument (which must be from the internal printf() call). – Nicolas Viennot Oct 09 '10 at 21:36

asveikau · Answer 6 · 2010-10-09T21:40:18.400

As @Pafy mentions, the compiler has evaluated this as a constant.

2 to the 64th minus 1 is 0xffffffffffffffff.

As 2 32-bit integers this is: 0xffffffff and 0xffffffff,
which if you take that as a pair of 32-bit signed types, ends up as: -1, and -1.

Thus for your compiler the code generated happens to be equivalent to:

printf("max unsigned long long = %llu\n", -1, -1);

In the assembly it's written like this:

movl    $-1, 8(%esp)   #Second -1 parameter
movl    $-1, 12(%esp)  #First -1 parameter
movl    $.LC0, 4(%esp) #Format string
movl    $1, (%esp)     #A one.  Kind of odd, perhaps __printf_chk
                       #in your C library expects this.
call    __printf_chk

By the way, a better way to calculate powers of 2 is to shift 1 left. Eg. (1ULL << 64) - 1.

The shift is useless. `-1ULL` will give the same result. – R.. GitHub STOP HELPING ICE Oct 09 '10 at 22:24 — R.. GitHub STOP HELPING ICE, Oct 09 '10 at 22:24

bodacydo · Answer 7 · 2014-10-19T05:32:47.980

0

No one in this thread noticed that the OP asked to explain the first 4 lines, not lines 11-14.

The first 4 lines are:

    .file   "size.c"
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align 4
.LC0:

Here's what happens in first 4 lines:

.file   "size.c"

This is an assembler directive that says that we are about to start a new logical file called "size.c".

.section    .rodata.str1.4,"aMS",@progbits,1

This is also a directive for read only strings in the program.

.align 4

This directive sets the location counter to always be a multiple of 4.

.LC0:

This is a label LC0 that can be jumped to, for example.

I hope I provided the right answer to the question as I answered exactly what OP asked.

edited Oct 19 '14 at 05:32

answered Oct 19 '14 at 03:44

bodacydo

75,521
93
229
319

1

Looking back at the OPs question, even the first version had this in the **title** of the question's `How are numbers greater than 2^32 handled by a 32 bit machine?` The entirety of the question really has 2 questions, but I have a feeling the real question the OP was asking was the one in the title. Possibly the OP felt that lines 1-4 did some voodoo magic to deal with numbers greater than 2^32. I will agree the question could use some work. – Michael Petch Jul 28 '16 at 14:08

How are numbers greater than 2^32 handled by a 32 bit machine?

7 Answers7

Linked