0

I am trying to learn assembly, and I am following a guide from a book, and in this book the author disassembles a very simple C program and goes line by line through the assembly. I am doing the same, but I am getting slightly different results. I am using a different /debugger than the author (he is using gdb, I am using lldb), so I am sure that is causing a difference, but I am wondering if somebody could explain what the difference is in this example.

The C program we are analyzing is this:

main()
{
    int i; 
    for(i = 0; i < 10; i++)
    {
        puts("Hello, world!");
    }
    return 0; 
}

Here is my dissasembled code (I am using lldb)

a.out`main:
    0x100000f40 <+0>:  push   rbp
    0x100000f41 <+1>:  mov    rbp, rsp
    0x100000f44 <+4>:  sub    rsp, 0x10
    0x100000f48 <+8>:  mov    dword ptr [rbp - 0x4], 0x0
->  0x100000f4f <+15>: mov    dword ptr [rbp - 0x8], 0x0
    0x100000f56 <+22>: cmp    dword ptr [rbp - 0x8], 0xa
    0x100000f5a <+26>: jge    0x100000f7d               ; <+61> at firstprog.c
    0x100000f60 <+32>: lea    rdi, [rip + 0x3f]         ; "Hello, world!"
    0x100000f67 <+39>: call   0x100000f86               ; symbol stub for: puts
    0x100000f6c <+44>: mov    dword ptr [rbp - 0xc], eax
    0x100000f6f <+47>: mov    eax, dword ptr [rbp - 0x8]
    0x100000f72 <+50>: add    eax, 0x1
    0x100000f75 <+53>: mov    dword ptr [rbp - 0x8], eax
    0x100000f78 <+56>: jmp    0x100000f56               ; <+22> at firstprog.c:6:15
    0x100000f7d <+61>: xor    eax, eax
    0x100000f7f <+63>: add    rsp, 0x10
    0x100000f83 <+67>: pop    rbp
    0x100000f84 <+68>: ret    

Why is the offset to rbp on the line where the arrow is 0x8? In the example in the book it is only 0x4, and I know I am about to store an int, so it makes sense it would be 0x4. I see it also makes an offset for 0x4, but why go on to the 0x8? I am a bit of a beginner in assembly, so apologies if this is an obvious question.

phuclv
  • 37,963
  • 15
  • 156
  • 475
nickhealy
  • 43
  • 7
  • 1
    If you could post the rest of the disassembly of the function, it might become clear what the code eventually does with the dword at `[rbp-0x4]`. But it's probably just that your are using a different compiler (version, options, etc) than the book, and it happens to do things differently. That's normal. – Nate Eldredge Jun 15 '21 at 05:26
  • @NateEldredge thank you -- I updated the post. That makes sense that it would be different, I'm just wondering if anybody might know why it is offsetting the register by 8? 4 makes a lot of sense for int's obviously, but why would it offset by 8? – nickhealy Jun 15 '21 at 05:30
  • 2
    While it's unclear why the compiler is doing so from this code example, is seems that it has reserved `[rbp-0x4]` for something and is placing `i` in the next available position `[rbp-0x8]` – Arkia Jun 15 '21 at 05:34
  • 1
    @Arkia I see -- so, we can assume it is using `[rbp-0x4]` for something else, but that our `i` int value is stored and manipulated at `[rbp-0x8]`? – nickhealy Jun 15 '21 at 05:36
  • @nickhealy That seems to be the case. – Arkia Jun 15 '21 at 05:38
  • @Arkia that's a good insight, thank you – nickhealy Jun 15 '21 at 05:39
  • 2
    The difference is due to different compilers, rather than different debuggers. Your compiler seems to be `clang`. If your book has more examples of disassembled C programs, you should consider installing `gcc`. – n. m. could be an AI Jun 15 '21 at 05:58
  • 1
    @n.1.8e9-where's-my-sharem.: or use Linux GCC on https://godbolt.org/. Agreed, looks like the clang -O0 behaviour of [Why is 0 moved to stack when using return value?](https://stackoverflow.com/q/31149806) – Peter Cordes Jun 15 '21 at 07:03
  • Also related: [GCC placing register args on the stack with a gap below local variables?](https://stackoverflow.com/q/58631698) re: the fact that stack layout is totally arbitrary. – Peter Cordes Jun 15 '21 at 07:04

1 Answers1

3

If you want to learn assembly - do not learn from -O0 (no optimization) assembly output.

With optimizations on the assembly, the output is much more logical:

.LC0:
        .string "Hello, world!"
main:
        push    rbx
        mov     ebx, 10
.L2:
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        sub     ebx, 1
        jne     .L2
        xor     eax, eax
        pop     rbx
        ret
0___________
  • 60,014
  • 4
  • 34
  • 74