7

I think my question might seem a bit odd, but here it goes; I'm trying to create a program dynamically in C++ (mostly for the fun of it, but also for a programmatic reason) and it is not so hard as it might sound. To do this you have to use assembly in runtime like this:

byte * buffer = new byte[5];
*buffer = '0xE9'; // Code for 'jmp'
*(uint*)(buffer + 1) = 'address destination'; // Address to jump to

This is much easier than it might seem, because I target only one platform and compiler; GCC with Linux 32bit (and also only one calling convention, cdecl). So I'm trying to create a dynamic assembly function to redirect calls from triggers, so I can use class methods as callbacks (even with C API libraries (with cdecl of course)). I only need this to support pointers and native types (char, int, short etc...).

ANYTHING MyRedirect(ANY AMOUNT ARGUMENTS)
{
    return MyClassFunc('this', ANY AMOUNT ARGUMENTS);
}

The function above, is the one I want to create in pure assembly (in memory with C++). Since the function is very simple, its ASM is simple as well (depending on arguments).

55                      push   %ebp
89 e5                   mov    %esp,%ebp
83 ec 04                sub    $0x4,%esp
8b 45 08                mov    0x8(%ebp),%eax
89 04 24                mov    %eax,(%esp)
e8 00 00 00 00          call   <address>
c9                      leave
c3                      ret  

So in my program, I have created an ASM pattern generator (since I don't know ASM especially well, I search for patterns). This function can generate assembly code (in bytes, for the exact case above, i.e a function that redirects and returns) by specifying the amount of arguments the function needs. This is a snippet from my C++ code.

std::vector<byte> detourFunc(10 + stackSize, 0x90); // Base is 10 bytes + argument size

// This becomes 'push %ebp; move %esp, %ebp'
detourFunc.push_back(0x55);     // push %ebp
detourFunc.push_back(0x89);     // mov
detourFunc.push_back(0xE5);     // %esp, %ebp

// Check for arguments
if(stackSize != 0)
{
    detourFunc.push_back(0x83);     // sub
    detourFunc.push_back(0xEC);     // %esp
    detourFunc.push_back(stackSize);    // stack size required

    // If there are arguments, we want to push them
    // in the opposite direction (cdecl convention)
    for(int i = (argumentCount - 1); i >= 0; i--)
    {
        // This is what I'm trying to implement
        // ...
    }

    // Check if we need to add 'this'
    if(m_callbackClassPtr)
    {

    }
}

// This is our call operator
detourFunc.push_back(0xE8);     // call

// All nop, this will be replaced by an address
detourFunc.push_back(0x90);     // nop
detourFunc.push_back(0x90);     // nop
detourFunc.push_back(0x90);     // nop
detourFunc.push_back(0x90);     // nop

if(stackSize == 0)
{
    // In case of no arguments, just 'pop'
    detourFunc.push_back(0x5D); // pop %ebp
}

else 
{
    // Use 'leave' if we have arguments
    detourFunc.push_back(0xC9); // leave    
}

// Return function
detourFunc.push_back(0xC3);     // ret

If I specify zero as the stackSize this will be the output:

55                      push   %ebp
89 e5                   mov    %esp,%ebp
e8 90 90 90 90          call   <address>
5d                      pop    %ebp
c3                      ret   

As you can see, this is completely valid 32-bit ASM, and will act as the 'MyRedirect' if it had zero arguments and no need for a 'this' pointer. The problem is, I want to implement the part where it generates ASM code, depending on the amount of arguments I specify that the 'redirect' function will receive. I have successfully done this in my little C++ program of mine (cracked the pattern).

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[])
{
    int val = atoi(argv[1]);

    printf("\tpush %%ebp\n");
    printf("\tmov %%esp,%%ebp\n");

    if(val == 0)
    {
        printf("\tcall <address>\n");
        printf("\tpop %%ebp\n");
    }

    else
    {
        printf("\tsub $0x%x,%%esp\n", val * sizeof(int));

        for(int i = val; i > 0; i--)
        {
            printf("\tmov 0x%x(%%ebp),%%eax\n", i * sizeof(int) + sizeof(int));
            printf("\tmov %%eax,0x%x(%%esp)\n", i * sizeof(int) - sizeof(int));
        }

        printf("\tcall <address>\n");
        printf("\tleave\n");
    }

    printf("\tret\n");
    return 0;
}

This function prints out the exact same pattern as the ASM code generate by 'objdump'. So my question is; will this be valid in all cases if I only want a redirect function as the one above, no matter the arguments, if it is only under Linux 32bit, or are there any pitfalls I need to know about? For example; would the generated ASM be different with 'shorts' or 'chars' or will this work (I've only tested with integers), and also if I call a function which returns 'void' (how would that affect the ASM)?

I might have explained everything a bit fuzzy, so please ask instead of any misunderstandings :)

NOTE: I do not want to know alternatives, I enjoy my current implementation and think it's a very interesting one, I would just highly appreciate your help on the subject.

EDIT: In case of interest, here are some dumps for the above C++ code: link

Elliott Darfink
  • 1,153
  • 14
  • 34
  • 6
    A very good alternative to encoding instructions by hand is the [asmjit](http://code.google.com/p/asmjit/) library (contrary to the name, this is not a JIT compiler, just something that JIT compilers can use). – Tamás Szelei May 05 '12 at 21:36
  • @afishwhoswimsaround That library looked awesome. Literally. That seems to be _exactly_ what I need. Quick question; can I create a function dynamically with this library (with malloc or something) so I can 'redirect' (with assembly 'jmp') other functions ('hook' them) to the generated functions? – Elliott Darfink May 05 '12 at 21:43
  • 1
    Yes, look at the examples http://code.google.com/p/asmjit/wiki/Examples – Christopher Oezbek May 05 '12 at 21:48
  • Your problem is going to be that byte * buffer = new byte[5]; is from a page of memory that is not marked as a code segment. You cannot run code form the data segment, or from pages off the heap that are not marked as code. There were literally hundreds of programs that used to manipulate code like this but stopped working on windows shortly after win 95. The OS does not give you permission to manipulate the page attributes. This was to stop malicious code from becomming rampant. Personally I think it sucks, but good luck on your virus. – Dan Jun 03 '12 at 20:50

1 Answers1

1

As Dan suggests, you need to mark the memory as executable. I wrote some code you can use. (It works on GNU/Linux and Windows.) If you intend to never support ARM, x86-64, or other platforms, then I don't see any downfalls to your code (with the executable part added) and it seems that it should "always work." (Assuming everything else is working properly of course.)

#include <sys/mman.h>

...

n = <size of code buffer>;
p = mmap(0, n, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANON|MAP_PRIVATE, 0, 0);

'fish' suggested you use asmjit. I have to agree with that; it's more portable than your method. However, you said you are not interested in alternatives.

You may be interested in something called "Thunking" (kind of). It basically tries to accomplish the "replace a C callback with a C++ method." This is actually pretty useful, but is not really a good design for your applications.

Hope that helps.

NotKyon
  • 383
  • 2
  • 7