0

This question is for Intel x86 assembly experts to answer. Thanks for your effort in advance!

Problem Specification

I am analysing a binary file, which match Mach-O 64-bit x86 assembly. I am currently using MacOS 64 OS. The assembly comes from objdump.

The problem is that when I am learning assembly, I can see variable name "$xxx", I can see string value in ascii and I can also see the callee name like "call _printf"

But in this assembly, I can get nothing above:

  1. no main function:

    Disassembly of section __TEXT,__text:
    __text:
    100000c90:  55  pushq   %rbp
    100000c91:  48 89 e5    movq    %rsp, %rbp
    100000c94:  48 83 ec 10     subq    $16, %rsp
    100000c98:  48 8d 3d bf 02 00 00    leaq    703(%rip), %rdi
    100000c9f:  b0 00   movb    $0, %al
    100000ca1:  e8 68 02 00 00  callq   616
    100000ca6:  89 45 fc    movl    %eax, -4(%rbp)
    100000ca9:  48 83 c4 10     addq    $16, %rsp
    100000cad:  5d  popq    %rbp
    100000cae:  c3  retq
    100000caf:  90  nop
    100000cb0:  55  pushq   %rbp  
    ...
    

    The above is codes frame will be executed, but I have no idea where it is executed.

Also, I newbie of AT&T assemble. Hence, could you tell me what is the meaning of instruction:

    0000000100000c90    pushq   %rbp
    0000000100000c98    leaq    0x2bf(%rip), %rdi       ## literal pool for: "xxxx\n"
    ...
    0000000100000cd0    callq   0x100000c90

Is it a loop? I am not sure but it seems to be. And why we they use %rip and %rdi register. In intel x86 I know that EIP represents current caller address, but I don't understand the meaning here.

  1. call integer: No matter what call convention they used, I had never seen code pattern like "call 616":

    "100000cd0: e8 bb ff ff ff  callq   -69 <__mh_execute_header+C90>"
    
  2. After ret: Ret in intel x86, means delete stack frame and return control flow to caller. It should be an independent function. However, after this, we can see codes like

    100000cae:  c3  retq
    100000caf:  90  nop
    /* new function call */
    100000cb0:  55  pushq   %rbp
    ...
    

    It is ridiculous!

  3. ASCII string lost: I have already viewed the binary in Hexadecimal format, and recognise some ascii string before reverse it to asm file.

However, in this file no ascii string occurrences!

  1. Total architecture review:

    Disassembly of section __TEXT,__text:
    __text:
    from address 10000c90 to 100000ef6 of 145 lines
    
    Disassembly of section __TEXT,__stubs:
    __stubs:
    from address 100000efc to 100000f14 of 5 lines asm codes:
    100000efc:  ff 25 16 01 00 00   jmp qword ptr [rip + 278]
    100000f02:  ff 25 18 01 00 00   jmp qword ptr [rip + 280]
    100000f08:  ff 25 1a 01 00 00   jmp qword ptr [rip + 282]
    100000f0e:  ff 25 1c 01 00 00   jmp qword ptr [rip + 284]
    100000f14:  ff 25 1e 01 00 00   jmp qword ptr [rip + 286]
    
    Disassembly of section __TEXT,__stub_helper:
    __stub_helper:
    
    ...
    
    Disassembly of section __TEXT,__cstring:
    __cstring:
    
    ...
    
    Disassembly of section __TEXT,__unwind_info:
    __unwind_info:
    
    ...
    
    Disassembly of section __DATA,__nl_symbol_ptr:
    __nl_symbol_ptr:
    
    ...
    
    Disassembly of section __DATA,__got:
    __got:
    
    ...
    
    Disassembly of section __DATA,__la_symbol_ptr:
    __la_symbol_ptr:
    
    ...
    
    Disassembly of section __DATA,__data:
    __data:
    
    ...
    

Since it might be a virus, I cannot execute it. How should I analyse it ?

Update on May 21

I have already identified where is the output, and if I totally understand the data flow pipeline represented in this programme, I might be able to figure out the possible solutions.

I am appreciated if someone can give me the detailed explanation. Thank you !

Update on May 22

I installed a MacOS in VirtualBox and after chmod privileges , I executed the programme but nothing special except for two lines of output happened. And the result hiding in the binary file.

Wang Yi
  • 31
  • 1
  • 7
  • Have you tried `otool -tV ` to see if does a better job of disassembling it? – Ken Thomases May 20 '17 at 21:59
  • Yes! This is better. I can see ascii comments on the right hand side of the assembly codes! More importantly, the comments tell me the function address so that I can understand what they are calling! Thumb up! Now I am studying AT&T syntax to explore the assembly – Wang Yi May 21 '17 at 05:32
  • Also this command just show "Text" part, as what you can see from "objdump", we also have data part for global variables, helper function and so on. Further techniques needed to explore it. – Wang Yi May 21 '17 at 08:18
  • It seems you added a new question to your question after you have already received an answer. This is discouraged here on Stack Overflow, as it invalidates the existing answers. To simply answer your question, `LEA` is not a loop: it stands for Load Effective Address, and is being used here to load the address of `0x2bf(%rip)` into the register `%rdi`. The reason `%rip` is being used here is because the code was compiled using RIP-relative addressing, an easily Googleable term. It certainly sounds like you need to pick up a book that teaches x86-64 assembly language to clear up the basics! – Cody Gray - on strike May 21 '17 at 09:05
  • Yes I am picking up RE4B and some open course materials from university. But some senior can save me much time to answer such questions (that might be why we need to study in a university rather than self taught everything). Three things to clear out. First, Thanks for you to tell me why we need %rip register if we use AT&T gramma. Second, the previous questions not finished. Finally, the loop I means from **0000000100000cd0** call back to **0x100000c90**. I want to make sure it is a loop because I didn't see index register yet. – Wang Yi May 21 '17 at 09:24
  • @CodyGray One more thing to add, you answered the question, but you didn't contribute the solutions to questions I raised. Hence no credits for you. Thank you. – Wang Yi May 21 '17 at 09:40

1 Answers1

2
  1. You don't need a main if you are not using C. The binary header contains the entry point address.
  2. Nothing special about call 616, it's just that you don't have (all) symbols. It's somewhat strange that objdump didn't calculate the address for you, but it should be 0x100000ca6+616.
  3. Not sure what you find ridiculous there. One function ends, another starts.
  4. That's not a question. Yes, you can create strings at runtime so you won't have them in the image. Possibly they are encrypted.
Jester
  • 56,577
  • 4
  • 81
  • 125
  • Thanks for your answer! I cannot vote you because I haven't earned enough reputation. But, would you like to tell me what the next possible ways for me to explore the binary? I would accept your answer if you hit the nails my head. ^_^ – Wang Yi May 21 '17 at 04:54
  • As for the question 1, since "_printf" called in assembly, I assumed that it was written in C language or C++ (very rarely in assembly analysis) – Wang Yi May 21 '17 at 05:31
  • Here "main" refers to programme entry like "__main__" if python, "class main method in Java" or "main" in C. Imagine it , if you don't know where the programme starts, how can you analyse it? – Wang Yi May 21 '17 at 09:08
  • You can examine the program headers which has the address of the entry point. That's what the OS loader looks at too. – Jester May 21 '17 at 12:50
  • I checked otool as you said using "otool -h". but just to find "magic number". When I am debugging using gdb and layout asm. The programme "barked" that 'Function "main" not defined.' Am I missing sth.? – Wang Yi May 23 '17 at 03:48
  • Depending on what you did in gdb, it might be looking for a `main` (e.g. the `start` command does). Anyway, looks like osx is a little complicated, see [this answer](https://stackoverflow.com/a/14422570/547981) for some additional info about entry point. – Jester May 23 '17 at 09:57
  • I got "entry" once I used "info files" in gdb assembly debug mode. I am going to accept your answer. I also invite you to answer my new questions on this problem. – Wang Yi May 24 '17 at 07:08
  • this is my new problem https://stackoverflow.com/questions/44196215/gdb-print-string-in-an-wrong-format-a-little-like-octal-string – Wang Yi May 26 '17 at 07:42