-1

I am attempting to write a binary instrumentation library (for ARM) that requires I move/relocate ARM assembly instructions. Note that the ordering of the instructions are staying the same, I am just moving them to different regions of memory so that I can input my proxy/hook instructions. As part of my proxy/hook, I jump back to the original instructions (in the same order) and then jump back to the original function. Here's an example of what it might look like:

pseudo-code

function:
    <jump to proxy>
    <rest of original function>

proxy:
    <some additional proxy instructions for logging, etc.>
    <instructions from original function that were moved>
    <jump back to original function plus offset>

I am noticing that if I blindly just move the instructions (without regards for the type of instructions), that I end up crashing the application.

I've determined that the "class" of these instructions are ones that reference the program counter (PC register). This makes sense to me since the location has now changed (moved to new region of memory) and any PC-relative offset is no longer correct.

However, I am wondering if there are other class/type of instructions that might be problematic. I've been trying to find references to help me but I haven't been able to. Also, I thought that instrumentation libraries are pretty common so I tried to see if I can find an open source example but I couldn't find any.

Is anyone aware of a similar open source project that is doing this? Or any references?

Any help would be greatly appreciated!

Jon
  • 1,381
  • 3
  • 16
  • 41
  • 1
    the arm documentation contains everything you need. What part of it did you not understand? – old_timer Aug 28 '17 at 00:25
  • if you didnt build this to be position independent, then you are looking at a lot of work, some percentage of it manual. – old_timer Aug 28 '17 at 00:40
  • Thanks for the input. I read through the ARM reference manual (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html) and it does a great job explaining each of the instructions. I guess I am just wondering if I determined the class/type of instructions that I'll need to be wary of (i.e. the ones that use pc-relative offset). The more I think about my problem, it seems like this would be similar to JIT (just in time) compilers where they need to reorder instructions. – Jon Aug 28 '17 at 01:54
  • 1
    Depends what code you are instrumenting. In hand written assembly it would be possible to make almost any instruction sensitive to relocation. If you are instrumenting some naive compiler output, then maybe it will never construct anything as fragile, and just by guarding offsets and relocating everything properly may work. But actually you should rather make your hooks work in-place (injecting original code inside the called proxy, so it acts as part of original code) so you don't need to relocate at all. You will still break some of machine code, which will have sensitive ins. at the hook. – Ped7g Aug 28 '17 at 02:13
  • 1
    Ah, now I see you actually do it in-place. It may still break for example if you interrupt thumb conditional block. Then inside your block probably the stack is different (as you need return address somewhere, unless you create proxy per-hook with somehow hardcoded return (I don't know ARM instructions from head to be sure what options are for return), so `pc` + `sp` are clearly different. Of course it will also break any decent anti-tamper code. It's much more reasonable to instrument the code in cooperation with compiler, that's the normal way to do these things. – Ped7g Aug 28 '17 at 02:18
  • 1
    *"I thought that instrumentation libraries are pretty common"* - from a theoretical point of view, if you want to produce anything **fully correct** (in mathematical way), it will be NP-complete, as this is basically extension of "halting problem". Which will make such tool impractical. I can imagine some tool based on heuristics, instrumenting some known compiler output, would mostly-work, with only rare breakages, so it may be even "good enough" to ship, but it looks like not that many people need a way to break some code by uncertain chance, the situation is already bad with ordinary bugs. – Ped7g Aug 28 '17 at 02:36
  • 1
    pc relative are the relocatable ones, those you dont mess with, what they point to though can vary. sometimes they are just loading a constant that wont fit as an immediate. Sometimes they are loading the address of something...an absolute address, that may need to change. is that an address to data, an address to a function? does it matter? – old_timer Aug 28 '17 at 02:55
  • 1
    can you programmatically determine if the item being read using a pc-relative address is simply data being loaded into a register, a big immediate basically or if it is something you need to relocate? (answer: no) – old_timer Aug 28 '17 at 02:56

1 Answers1

2

Is anyone aware of a similar open source project that is doing this? Or any references?

You can look at DynamoRIO -- it does much more than this: among other, it can decode instructions, pass it to you for insertion/deletion/reordering (with some restrictions) and reincode them to the code cache (from which the code is actually executed). It is open source and supports ARM.