0

Given a sample assembly code snippet as follows:

[1] push eax
[2] push ebx
[3] call function_with_1_arg
[4] push ecx
[5] call function_with_2_args

Is there any IDA Python API, or simple script, to identify that lines 1 and 4 are the "relevant pushes" for the function_with_2_args call (line 5)?

Thanks!

EDIT:

Thanks @ped7g for the insights. I didn't realise that it can be so complicated. (I see that IDA helps to identify and annotate the pushes for some of the system API calls, so I thought that IDA could already identify argument pushes for function calls in general.)

I suppose then, for a start, I'll work with some (many?) simplifying assumptions which should hold for majority of cases that I encounter. E.g. I'm not even considering like fastcall. Probably, stdcall and cdecl should suffice for now.

So I'll assume for now that arguments for the function calls will be pushed to the stack prior to the call. The only complication that I would have to deal with, for now, are "interleaved" function calls (as in the e.g. above).

I've given it a bit of thought on how to implement this:

Get func_argsize = GetFunctionAttr(..., FUNCATTR_ARGSIZE)
Set other_funcs_argsize = 0
Set i = 0
while func_argsize > 0:
    Get prev instruction using PrevHead(...)
    if GetMnem(...) == 'call':
        Follow call operand addr
        Set other_funcs_argsize += FUNCATTR_ARGSIZE of the other function
    elif GetMnem(...) == 'push':
        Get operand size of the PUSH (***)
        if other_funcs_argsize > 0:
            other_funcs_argsize -= operand size
        else:
            func_argsize -= operand size
            output that this is the i-th arg for the function call
            i += 1

The only gap I still have, is how to implement the pseudocode in (***), i.e. how to determine the operand size of a PUSH instruction.

ADDON:

Oh, I need the operand size of PUSH instructions, because FUNCATTR_ARGSIZE gives the number of bytes for function arguments. If, instead, there's an API to find out how many arguments a function expects, then I won't need operand size. I'll just have to count the pushes. But so far, I couldn't find such an API.

Edwin Lee
  • 3,540
  • 6
  • 29
  • 36
  • In general Assembly code the only way how to detect this would be to run the code under some virtual machine and keep all the memory written data "marked" by instruction setting them up, then upon calling any function those markers would be copied out. As there are many possible ways, how to alter stack memory. If the code is generated by some less powerful tool, like some naive inefficient compiler, always resulting in `push` sequence and doing regular `call/ret` without any tricks, then maybe it would be possible to decipher it even programmatically. But in general case it would be NP task. – Ped7g Dec 19 '16 at 11:06
  • Well again.. in 32b mode (like your first example), 99% of `push` instructions will store 32bits. It's possible to write 16b instruction too, but I'm not aware of any compiler to use it practically, so you can encounter that one only from hand written assembly. BUT, doing "prev instruction" will be tough business. For example `push ecx` vs `push cx` doesn't differ in opcode bytes (`0x51`), but the 16b version has prefix byte (`0x66`) ahead. So while going backwards, you will have hard time to tell, whether that `0x66` is prefix, or part of previous opcode arguments. "next ins." would be OK. – Ped7g Dec 20 '16 at 09:38
  • On some RISC-like CPU where 1 instruction is 1 word, going "prev instruction" will be no big deal. But on x86 you need rather starting point and advance forward. The disassemblers have some heuristics to look for correct instructions boundary and most of the time they guess it right, but it's part of obfuscation/anti-debug measures to hide some instructions into larger opcodes preventing simple disassembly without having entry points (actually fake entry point may confuse disassembler even more). – Ped7g Dec 20 '16 at 09:41
  • But generally in 32b mode the `FUNCATTR_ARGSIZE/4` = number of arguments (unless argument is some struct, then it's size goes into ARGSIZE, but it's single argument? .. I'm not sure what API calls you are talking about, in C++ you can have also structs/etc, but in C++ binary you don't have any way to get ARGSIZE, unless it was provided externally ... so you are probably talking about Python API calls. Can they process only int arguments?). – Ped7g Dec 20 '16 at 09:44
  • 1. Do you mean that, even though IDA has already identified the instructions correctly (i.e. I can see the correct disassembled instructions on graph view and text view), getting the previous instruction using the `PrevHead(...)` call may still give me the wrong instruction boundary? – Edwin Lee Dec 21 '16 at 01:30
  • 2. Yeah, I understand that some obfuscation/anti-disassembly techniques will mess up instructions (resulting in `sp-analysis failed` or obviously wrong instructions). But the assumption here is that, before I run this script, I would have first fixed this up. I.e. when I run the script, I can assume that the disassembly is correct. – Edwin Lee Dec 21 '16 at 01:30
  • 3. Yeah, I suppose I can assume for now that each argument will be 4 bytes, so I can just do `FUNCATTR_ARGSIZE / 4`. I thought if there was an easy way to more accurately determine number of arguments (or size of PUSH operand), then I should take that approach. If not, this should suffice for now. – Edwin Lee Dec 21 '16 at 01:31
  • 1+2) No, I didn't understood you are working over already disassembled code by IDA. In that case "prev" is trivial. The IDA can still disassemble the code wrongly, but there's no easy fix to it. There's more to it than some "sp-analysis failed", the correct code may be hidden behind another valid opcodes, so only when you have correct entry points, you will see the real code. Then again I'm talking about hand-crafted anti-debug/anti-disassembly code. If you are checking some standard compiler result calling Python API, then the disassembly is very likely correct. – Ped7g Dec 21 '16 at 01:50
  • 3) `push` in 32b mode is 32b (99% .. or probably 100% for normal binaries without protection). Then again there are more ways how to fill stack memory, some code may do `sub esp,32` and then `mov [ebp-offsets],values` to set that memory, `push` is not mandatory. So .. for simple-advanced approach you should detect all `push`, and all other `esp` modifications (`call/sub/add/lea/mov/enter/leave/and` are basic ones (practically used to adjust `esp` by C++ compilers), but if somebody want to modify it in weird way, there's more of them. I think you can safely omit `shl/shr`. :D – Ped7g Dec 21 '16 at 01:58

0 Answers0