assembly x86 'decompiling'

Question

I'm having trouble understanding this assembly x86 code (AT&T notation). I need to be able to understand it (write C++ function that is compiled to that code) and solve similar exercises on the exam. Can you explain to me which part does what and what is the convention?

f:
    pushl %ebp ; 1
    movl %esp, %ebp; 2
    pushl %ebx ; 3
    subl $36, %esp; 4
    movl 8(%ebp), %edx ; 5
    movl 12(%ebp), %eax ; 6
    movl (%eax), %eax ; 7
    movl %edx, 8(%esp) ; 8
    leal 16(%ebp), %edx ; 9
    movl %edx, 4(%esp) ; 10
    movl %eax, (%esp) ; 11
    call f; 12
    movl %eax, -12(%ebp) ; 13
    movl 16(%ebp), %edx ; 14
    movl 12(%ebp), %eax ; 15
    movl %edx, (%eax) ; 16
    movl 12(%ebp), %eax ; 17
    movl (%eax), %edx ; 18
    movl -12(%ebp), %eax ; 19
    movl %edx, 8(%esp) ; 20
    leal 8(%ebp), %edx ; 21
    movl %edx, 4(%esp) ; 22
    movl %eax, (%esp) ; 23
    call f; 24
    movl %eax, %ebx; 25
    movl 16(%ebp), %edx ; 26
    movl -12(%ebp), %eax ; 27
    movl %edx, 8(%esp) ; 28
    movl 12(%ebp), %edx ; 29
    movl %edx, 4(%esp) ; 30
    movl %eax, (%esp) ; 31
    call f; 32
    movl %eax, %edx; 33
    movl 16(%ebp), %eax ; 34
    movl %edx, 8(%esp) ; 35
    leal 8(%ebp), %edx ; 36
    movl %edx, 4(%esp) ; 37
    movl %eax, (%esp) ; 38
    call f; 39
    movl %ebx, 8(%esp) ; 40
    leal -12(%ebp), %edx ; 41
    movl %edx, 4(%esp) ; 42
    movl %eax, (%esp) ; 43
    call f; 44
    addl $36, %esp; 45
    popl %ebx ; 46
    popl %ebp ; 47
    ret; 48

There are no jumps, but a few of 'call f', does it mean that there is an infinite loop?

This is clearly one of those cases where you really should post "your best attempt" before asking for help, as that will, if nothing else, tell us at what level you need help. — Mats Petersson, Feb 02 '14 at 10:56
So far how you did translate it? (BTW a step by step execution in debugger will answer your 2nd question...) — Adriano Repetti, Feb 02 '14 at 10:56
@Adriano: I can think of plenty of examples where this would be pretty unfeasible. And even in this case, it may take a long time to reach a conclusion. — Mats Petersson, Feb 02 '14 at 10:59
You should only need a quick look at the first 12 instructions to determine that it's going to run out of stack space and crash! :-) — Brendan, Feb 02 '14 at 12:06
@MatsPetersson yes, of course compiled code can be quickly **really** complicated but: 1) this is an exam text. 2) I don't see any attempt to understand what's going on... — Adriano Repetti, Feb 02 '14 at 14:14
@Brendan - To me, this looks like disassembled object code, not a disassembled executable. If so, then it might not be recursive code. — Sparky, Feb 02 '14 at 14:39
Where comes the name `f` for function comes from is that something from the symbol table or did you entered the name? When you disassemble code with relocation information the relocation information will not be shown by objdump. This can lead to the wrong conclusion that the function has a recursive call. — harper, Feb 02 '14 at 15:25

score 2 · Answer 1 · answered Feb 02 '14 at 14:30

Below is a little bit to help you get going.

Step 1. Divide the code up into logical chunks. Key things to look for to identify logical chunks are the stack prologue and epilogue code, function calls, branch statements and addresses identified by the branch statements.

Step 2. Make notes about what each chunk is doing.

For example ...

f:
    pushl %ebp
    movl %esp, %ebp      ; Create the stack frame
    pushl %ebx           ; and save non-volatile register EBX
    subl $36, %esp       ; Carve space for 9 32-bit words on the stack

    ; Notes: 8(%ebp) is the address for the 1st parameter
    ;       12(%ebp) is the address for the 2nd parameter
    ;       16(%ebp) is the address for the 3rd parameter
    ;
    ; Anything addresses as -#(%ebp) will be a stack variable
    ; local to this function.
    ;
    ; Anything addressed as #(%esp) will be used to pass parameters
    ; to the sub-function.  The advantage of doing it this way is that
    ; parameters passed to the sub-function do not have to be popped
    ; after every call to a sub-function.

    movl 8(%ebp), %edx         ; EDX = 1st parameter
    movl 12(%ebp), %eax        ; EAX = 2nd parameter
    movl (%eax), %eax          ;       The 2nd parameter is a pointer!
    movl %edx, 8(%esp)         ; Pass EDX as 3rd parameter to sub-function
    leal 16(%ebp), %edx        ; EDX = address of 3rd parameter to this function
    movl %edx, 4(%esp)         ;       Passing it as 2nd parameter to sub-function
    movl %eax, (%esp)          ; Pass EAX as 3rd parameter to sub-function
    call f                     ; Call sub-function
    movl %eax, -12(%ebp)       ; Save return value to local stack variable

    ; More Notes:
    ; I am guessing that this bit of decompiled code was an object file.
    ; Experience has shown me that when the address sub-functions used by
    ; CALL are all the same (and match the address of the calling function)
    ; this is often due to decompiling an object file as opposed to an
    ; executable.  If however, the sub-function address truly is '0xf', then
    ; this will be a recursive routine that will blow the stack as there is
    ; no exit condition.

    movl 16(%ebp), %edx    ; EDX: 3rd parameter passed to function
                           ;      likely modified by previous CALL
    movl 12(%ebp), %eax    ; EAX: 2nd parameter passed to function
    movl %edx, (%eax)      ; Save EDX to the location pointed to by the 2nd parameter
    movl 12(%ebp), %eax    ; EAX: 2nd parameter passed to function (recall it's a ptr)
    movl (%eax), %edx      ;    ... and so on ...
    movl -12(%ebp), %eax
    movl %edx, 8(%esp)
    leal 8(%ebp), %edx)
    movl %edx, 4(%esp)
    movl %eax, (%esp)
    call f
    movl %eax, %ebx

    movl 16(%ebp), %edx
    movl -12(%ebp), %eax
    movl %edx, 8(%esp)
    movl 12(%ebp), %edx
    movl %edx, 4(%esp)
    movl %eax, (%esp)
    call f
    movl %eax, %edx

    movl 16(%ebp), %eax
    movl %edx, 8(%esp)
    leal 8(%ebp), %edx
    movl %edx, 4(%esp)
    movl %eax, (%esp)
    call f
    movl %ebx, 8(%esp)

    leal -12(%ebp), %edx
    movl %edx, 4(%esp)
    movl %eax, (%esp)
    call f

    addl $36, %esp             ; Reclaim that carved stack space
    popl %ebx                  ; Restore the non-volatile register EBX
    popl %ebp                  ; Restore to the caller's stack frame
    ret                        ; Return

I am leaving the rest for you. I hope this helps you along.

score 0 · Answer 2 · answered Feb 02 '14 at 14:12

0

This function f is a recursive function without termination of the recursion. Something like

void f(int a, int b, int c)
{
    f(a,b,c);
    //....
}

Stop evaluating the disassembly, since it isn't worth to get such bad code in any high level language.

answered Feb 02 '14 at 14:12

harper

13,345
8
56
105

Not necessarily. It may be disassembled object code, as opposed to a disassembled executable. There are three items that support this. First is the low numeric address of the function being disassembled. Second is the fact that all sub-functions being called from it are using the same address as the calling function. Third, the sub-functions being called do not all appear to be taking the same number of parameters. – Sparky Feb 02 '14 at 14:36
@Sparky Some annotations to your comment: 1. What low address? There is no address at all in the question. 2. That's called recursion. 3. When the function calls itself, it doesn't decide how many parameter it takes unless it is variadic. But that's impossible without branches inside the function. – harper Feb 02 '14 at 15:20
The label at which the function begins is 'f'--an ambiguous label. Is that to be interpreted as a string or an address. I am interpreting it as a hex value--an address. The majority of decompiled code that I encounter usually does not have addresses at that range, unless it is an object file. If it is an object file, then the addresses shown in the CALL instructions are a sort of stub to be resolved at link time. If I am wrong, and it is indeed a recursive function, then it is a poorly formed one. – Sparky Feb 02 '14 at 15:42
I've never seen a function address for a 32-bit machine that is NOT 32-bit aligned. I am convinced that you don't have seen the address before the colon. When you have an address in disassembly list you get it in each line with object code bytes. I have never seen the address before the colon in a disassembly. Therefore the `f:` is the name of a label not an address. – harper Feb 02 '14 at 15:45

niczka · Answer 3 · 2014-02-02T22:03:39.143

I came to the solution:

int f (int i, int* j, int k) {
    int n = f(*j, &k, i);
    *j = k;
    f( f(n, &i, *j), &n, f(k, &i, f(n, j, k)) );
    return 0;
}

when compiling my code
g++ -m32 -S a.cpp

I get the following assembly code:
_Z1fiPii:
.LFB971:
.cfi_startproc
.cfi_personality 0,__gxx_personality_v0
.cfi_lsda 0,.LLSDA971
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
pushl %ebx
subl $36, %esp
.cfi_offset 3, -12
movl 8(%ebp), %edx
movl 12(%ebp), %eax
movl (%eax), %eax
movl %edx, 8(%esp)
leal 16(%ebp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
.LEHB0:
call _Z1fiPii
movl %eax, -12(%ebp)
movl 16(%ebp), %edx
movl 12(%ebp), %eax
movl %edx, (%eax)
movl 16(%ebp), %edx
movl -12(%ebp), %eax
movl %edx, 8(%esp)
movl 12(%ebp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
call _Z1fiPii
movl 16(%ebp), %edx
movl %eax, 8(%esp)
leal 8(%ebp), %eax
movl %eax, 4(%esp)
movl %edx, (%esp)
call _Z1fiPii
movl %eax, %ebx
movl 12(%ebp), %eax
movl (%eax), %edx
movl -12(%ebp), %eax
movl %edx, 8(%esp)
leal 8(%ebp), %ecx
movl %ecx, 4(%esp)
movl %eax, (%esp)
call _Z1fiPii
movl %ebx, 8(%esp)
leal -12(%ebp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
call _Z1fiPii
.LEHE0:
movl $0, %eax
jmp .L5
.L4:
movl %eax, (%esp)
.LEHB1:
call _Unwind_Resume
.LEHE1:
.L5:
addl $36, %esp
popl %ebx
.cfi_restore 3
popl %ebp
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc

Is this one equivalent to the one pasted before?

assembly x86 'decompiling'

3 Answers3