Finding start of main function with ptrace

Question

I have a file scope kernel extension that informs a daemon when an application is launched. The daemon is required to pause the launched application at the beginning of its first instruction in main().

When calling ptrace with PT_ATTACH, the daemon appears to attach too early and is in the dynamic linker (dyld).

Here is an example of the callstack of thread 0 when attached: -

Thread 0:
0   dyld                            0x00007fff6e4cd35e mach_reply_port + 10
1   dyld                            0x00007fff6e4cd4d4 _mig_init + 13
2   dyld                            0x00007fff6e4cd17f mach_init + 46
3   dyld                            0x00007fff6e4aa239 dyldbootstrap::start(macho_header const*, int, char const**, long, macho_header const*, unsigned long*) + 411
4   dyld                            0x00007fff6e4aa05e _dyld_start + 54

Therefore, is there any way to either ensure the daemon can attach to the beginning of the main function after loading libraries has finished, or to repeatedly single step to that point, in which case, how would I be able to find the address of main, considering there may be no symbols available for a launched application?

Thanks.

Ismael Luceno · Answer 1 · 2016-09-21T01:36:36.497

Just a guess, but since the libc will have to call exit after main, you can probably identify the call to main by walking _start until you find a call to the .text section followed by a call to the .plt section. You also know the exit status of main is the parameter to exit, so you're looking for a mov %eax, %edi, and probably main comes after _start and .plt before .text, and you know both don't go too far, so you can make it pretty generic.

Given the runtime/libc/compiler is sane (Mac OS X is based on FreeBSD, right?), you can probably get away with matching just a few bytes, with something like this (you should probably add some safety at least):

size_t *find_main_vmoffset(char *text_start, size_t text_size)
{
    char *p = text_start;

    const char sig[] = {
        /* 0xe8, ??, ??, */ 0, 0,       /* callq  main */
        0x89, 0xc7,             /* mov    %eax, %edi */ 
        0xe8, /* ??, ??, 0xff, 0xff, */     /* callq  exit */
    };

    while (p = memmem(p, text_size - (p - text_start), sig, sizeof(sig))) {
        /* Check it's one call followed by another */
        if (p[-3] != 0xe8)
            continue;
        /* Check the second call is backwards (into .plt) */
        if (p[7] != 0xff || p[8] != 0xff)
            continue;

        /* Eureka! */
        return (p - text_start) + 2 + *(int32_t *)(p - 2);
    }
    return 0;
}

You just need to add the virtual address of .text to that, and you're basically done :).

I noticed some libc implementations use wrappers (e.g. glibc), so in that case it's a little more complicated, but it should be identifiable in a similar fashion nonetheless. — Ismael Luceno, Sep 21 '16 at 16:06

score -1 · Answer 2 · answered May 13 '13 at 14:56

-1

Main is not the actual entry point, but rather a name that gets mapped to the _start symbol of an assembly executible. Depending on your executable type (elf-32 or something), it will start at different addresses (this might be randomized). You can use GDB to find the entry point's address (or the first instruction executed) by stepping through your program once (using gdb, then si), and then print the value of the PC register (using show registers or something).

You can also switch to assembly layout in gdb and see the whole stack trace and see where it starts. Executable code usually begins at like 0x0800xxxx

answered May 13 '13 at 14:56

KrisSodroski

2,796
3
24
39

1

Thanks, but I think you're missing my point. It's the daemon that I'm writing which needs to halt any application that is launched. I can't use GDB and main is a possible entry point, but I just need to ensure loading of all libraries have finished and no other instructions are executed. I'm fully aware that the entry point will differ in its address, besides ASLR, each application will have a different memory footprint. – TheDarkKnight May 13 '13 at 14:58
If you're ptracing, just have ptrace stop at every system call. Either that, or have your traced program call a ptrace_signal to stop execution there and let your daemon know its there. that will be after the main entry point though, and optimizations may cause other things to happen between the main() and the signal. – KrisSodroski May 13 '13 at 15:02
How do I have the traced program call a ptrace_signal to stop execution? The traced program is any possible application on the computer that a user may run, not one that I've written? Sorry, am I misunderstanding you? – TheDarkKnight May 13 '13 at 15:08
1

If its not a program that you have written, you can inject code into the program image. Other than that, your best bet would be to single step through the application, and then make your best bet as to where main is. this can be assisted using GDB, which will show you what address the PC should be at when executing that program image (as i said earlier). Then, using ptrace, you can poke the data in the PC until you hit the value of the Instruction held by the PC that you found with gdb. Why do you need to know this? – KrisSodroski May 13 '13 at 15:11
Ok, so now I have a chicken and egg situation; I'm actually wanting to inject code into all launched applications, but need to guarantee that the injection is done before the first instruction in main (or equivalent) is called. If I inject during dynamic loading, the target app crashes. If I inject too late, the function I'm overriding with the injection code could have already been called. – TheDarkKnight May 13 '13 at 15:17
Yeah, so you're going to have to find out what the first instruction is that's called in main. My bet would be that it is going to be a call instruction to an absolute address (the call into _start after loading the libraries, but _start may load the libraries). Use GDB to help you by stepping through the program until all the libraries are loaded. Once this is done, just inspect the value of the PC (which will be the value of the next instruction to be executed, which will be inside main). Then use Ptrace to stop once the PC holds this value, then you can inject your code. – KrisSodroski May 13 '13 at 15:19
It might also be a jmp instruction into your code. You'll be able to tell with GDB if you step through it. – KrisSodroski May 13 '13 at 15:21
I think I understand exactly what you're saying, but realistically, I can't do this for every possible application that could be on a user's machine, or are you implying that there should be some pattern to the address once I've examined one launched app with gdb? – TheDarkKnight May 13 '13 at 15:22
I actually think that the executable formats sometimes have hard coded start addresses. Like I said though, this address relates to _start, which is the assembly directive of the entry point (which is not equal to main always). You might have to reverse engineer every program like this, unless you can find a common element in all the programs (like the first jmp instruction in the code comes after loading libraries or something). – KrisSodroski May 13 '13 at 15:25
You can also try objdump -D in order to see the symbols and instructions of the program. The main might be sitting right before your eyes then. – KrisSodroski May 13 '13 at 15:28
Not on OSX; objdump doesn't exist. There is otool though, which may do a similar thing. Thanks again. – TheDarkKnight May 13 '13 at 15:32

Finding start of main function with ptrace

2 Answers2