0

My goal is to hook C functions with no symbol in stripped binaries on ARM platform. As the addresses of these functions may changed (ie. with an update of the binary), I want my interposing dynamic library itself to find the addresses.

Moreover, these functions, at some point, uses an C-string which never changes through updates. With this in mind, here are the 3 steps to find the functions address:

1) find the address of the C-string itself ( by analyzing the __cstring section inside the __TEXT segment.

2) find the address of the reference to the string.

3) Starting from the xref address, go backwards until I find a function prologue.

I could implement step 1) and 3), but I'm a bit lost for 2). What is exactly an xref ? How can I identify the one corresponding to the C-string ? No code is needed, just some theory.

Thanks !

jb_
  • 23
  • 3

2 Answers2

0

You would have to disassemble all the code, determining what code reads from memory. You would have to decode the address of those instructions that read from memory (not sure if ARM is relative or absolute). Furthermore, in order to properly disassemble everything you would need to start at a known entry point and follow all the calls to find out where functions start (I'm assuming ARM has variable sized instructions, i.e. instructions with immediates). This is not a simple thing to do, cross-reference analysis is a rather significant feature of reverse engineering software (like IDA Pro for instance). Furthermore, if the memory is not accessed directly you would have to analyze all indirect memory accesses (accesses through registers).

CrazyCasta
  • 26,917
  • 4
  • 45
  • 72
  • I've just thought about `memcmp`'ing the pointer to the address of the string with the whole `__text` section. It may work, I'll keep you updated – jb_ Oct 01 '12 at 22:51
  • @CrazyCasta: ARM actually uses constant-sized instructions in ARM mode (32-bit instructions). ARM has both relative and absolute addressing modes. – nneonneo Oct 01 '12 at 22:53
  • Do you mean that all instructions are 32-bit and 32-bit aligned. So I can look at any 32-bit aligned address and know it's either the beginning of an instruction or not part of an instruction at all? (Again I know nothing about ARM specifics, I'm talking from experience on x86, MIPS and MSP430). – CrazyCasta Oct 01 '12 at 22:57
  • 1
    @nneonneo: that's true if you're ARM is using only the 'tradition' non-thumb ARM architecture instructions. However, if it's using a mix of thumb and ARM32 or using thumb2 instructions (some of which use multiple 16-bit opcode 'slots'), things can be a little more complex. – Michael Burr Oct 01 '12 at 23:06
0

What you can do instead of the search of the string and its xref (LDR R5, = 0xnnnnnn) is to make an byte array containing the epilogue of the function and search for that. The more bytes you're taking for search the more the uniqueness increases. To go by the (C/ASCII) string isn't well and only for manually (by hand in IDA) working

Otherwise you have to implement a disassembler engine (like that which decodes the instructions in IDA, comparable to those on x86 platform) and to parse each instruction.