6

I am trying to modify linux system call's default behavior. At the moment I am trying to hook and add a simple print statement before they are actually getting invoked. I know about the standard 'wrap' option of GCC linker and how it can be used to hook wrappers Link to GCC Linker options. This perfectly works for open(), fstat(), fwrite() etc (where I am actually hooking the libc wrappers).

UPDATE:

The limitation is that NOT all system calls gets hooked up with this approach. To illustrate that let us take a simple statically compiled binary. When we try adding wrappers, they are getting effected from the calls that we introduce after main() (Please see the strace output shown below)

> strace ./sample 

execve("./sample", ["./sample"], [/* 72 vars */]) = 0
uname({sys="Linux", node="kumar", ...})   = 0
brk(0)                                  = 0x71f000
brk(0x7201c0)                           = 0x7201c0
arch_prctl(ARCH_SET_FS, 0x71f880)       = 0
readlink("/proc/self/exe", "/home/admin/sample"..., 4096) = 41
brk(0x7411c0)                           = 0x7411c0
brk(0x742000)                           = 0x742000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 4), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbcc54d1000
write(1, "Hello from the wrapped readlink "..., 36Hello from the wrapped readlink :з
) = 36
readlink("/usr/bin/gnome-www-browser", "/etc/alternatives/gnome-www-brow"..., 255) = 35
write(1, "/etc/alternatives/gnome-www-brow"..., 36/etc/alternatives/gnome-www-browser
) = 36
exit_group(36)                          = ?
+++ exited with 36 +++

If we notice the binary carefully the first "un-intercepted" call readlink() (system call 89 i.e. 0x59) comes from these lines -- some linker related code portion (i.e. _dl_get_origin) does a readlink() for its functioning. These implicit syscall (though present in binary code) are never getting hooked up by our "wrap" approach.

  000000000051875c <_dl_get_origin>:
  51875c:       b8 59 00 00 00          mov    $0x59,%eax
  518761:       55                      push   %rbp
  518762:       53                      push   %rbx
  518763:       48 81 ec 00 10 00 00    sub    $0x1000,%rsp
  51876a:       48 89 e6                mov    %rsp,%rsi
  51876d:       0f 05                   syscall 

How to extend the wrapping idea to system calls like readlink() (including all the implicit ones being invoked) ?

Sandhya Kumar
  • 293
  • 3
  • 11

1 Answers1

1

ld have an option for wrapping, the quote from manual:

--wrap symbol

Use a wrapper function for symbol. Any undefined reference to symbol will be resolved to __wrap_symbol. Any undefined reference to __real_symbol will be resolved to symbol. This can be used to provide a wrapper for a system function. The wrapper function should be called __wrap_symbol. If it wishes to call the system function, it should call __real_symbol.

It works fine with system calls too. Here's an example with readlink:

#include <stdio.h>
#include <string.h>
#include <unistd.h>

ssize_t __real_readlink(const char *path, char *buf, size_t bufsiz);

ssize_t __wrap_readlink(const char *path, char *buf, size_t bufsiz) {
    puts("Hello from the wrapped readlink :з");
    __real_readlink(path, buf, bufsiz);
}

int main(void) {
    const char testLink[] = "/usr/bin/gnome-www-browser";
    char buf[256];
    memset(buf, 0, sizeof(buf));
    readlink(testLink, buf, sizeof(buf)-1);
    puts(buf);
}

To pass the option to the linker from the compiler use -Wl option:

$ gcc test.c -o a -Wl,--wrap=readlink
$ ./a
Hello from the wrapped readlink :з
/etc/alternatives/gnome-www-browser

The idea is that __wrap_func is your function wrapper. The __real_func linker would link with the real function func. And every call to a func in the code would be replaced with __wrap_func.

UPD: One may notice that a binary being compiled statically calls another readlink, which aren't being intercepted. To understand the reason, just do a little experiment — compile the code to the object file, and list the symbols, like:

$ gcc test.c -c -o a.o -Wl,--wrap=readlink
$ nm a.o
0000000000000037 T main
                 U memset
                 U puts
                 U readlink
                 U __real_readlink
                 U __stack_chk_fail
0000000000000000 T __wrap_readlink

The interesting thing here is that you won't see references to a bunch of functions that being seen with strace before entering the main function — e.g. uname(), brk(), access(), and etc. That is because the main function isn't the first code that being called in your binary. A bit of research with objdump will show you, that the first function called _start.

Now, let's do another example — override the _start function:

$ cat test2.c
#include <stdio.h>
#include <unistd.h>

void _start() {
        puts("Hello");
        _exit(0);
}
$ gcc test2.c -o a -nostartfiles
$ strace ./a
execve("./a", ["./a"], [/* 69 vars */]) = 0
brk(0)                                  = 0x150c000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3ece55d000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=177964, ...}) = 0
mmap(NULL, 177964, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3ece531000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\37\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1840928, ...}) = 0
mmap(NULL, 3949248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f3ecdf78000
mprotect(0x7f3ece133000, 2093056, PROT_NONE) = 0
mmap(0x7f3ece332000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x7f3ece332000
mmap(0x7f3ece338000, 17088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3ece338000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3ece530000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3ece52e000
arch_prctl(ARCH_SET_FS, 0x7f3ece52e740) = 0
mprotect(0x7f3ece332000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f3ece55f000, 4096, PROT_READ) = 0
munmap(0x7f3ece531000, 177964)          = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 10), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3ece55c000
write(1, "Hello\n", 6Hello
)                  = 6
exit_group(0)                           = ?
+++ exited with 0 +++
$

What was it?! We just overridden the first function in the binary, and still see the system calls — why?

Actually it is because the calls being executed not by your application, but rather by the kernel before your application being loaded into the memory, and allowed to run.

UPD: as we saw previously, the functions aren't called by your application. Honestly, I couldn't find what's being done for statical binaries after a shell calls execve for your app, but from the list it looks like every call you see being done by the kernel itself — without any side application, like dynamic linker which aren't needed for statical binaries (and because there're functions like brk that works with data segments).

Whatever, you surely can not modify this behavior that easy, you will need some hacking. Because if you could easily override a function for the code which is executed before your binary run — i.e. from the other binary — it would be a big black hole in the security, just imagine: once you need a root rights, you override a function with one to execute your code, and wait a bit while some daemon with root rights happen to execute a script, and thus trigger your code into play.

Hi-Angel
  • 4,933
  • 8
  • 63
  • 86
  • Thaks for your post. But the solution is NOT correct. You are hooking the readlink() you added and not the default readlink() an executable calls. If you do strace, you see two readlinks and your printf output is from the readlink() you introduced – Sandhya Kumar Jun 30 '15 at 08:08
  • readlink("/proc/self/exe", "/home/admin/sample"..., 4096) = 41. This is never hooked and that is my actual question – Sandhya Kumar Jun 30 '15 at 08:12
  • @SandhyaKumar what? How then I could get the `/etc/alternatives/gnome-www-browser` output, if I wouldn't hooked the system call? – Hi-Angel Jun 30 '15 at 08:22
  • @SandhyaKumar I just did strace, and I saw the `readlink("/usr/bin/gnome-www-browser", "/etc/alternatives/gnome-www-brow"..., 255) = 35`. Is that what you wanted? – Hi-Angel Jun 30 '15 at 08:24
  • If you do strace, you should note TWO readlinks() being executed. Namely the first one is the "/proc/self/exe" line I have mentioned above. The second line is the readlink() call you added extra. That second call is rightly hooked. – Sandhya Kumar Jun 30 '15 at 08:27
  • @SandhyaKumar I did strace, and even changed the path to `/proc/self`. I still see only a single call to the `readlink`. Perhaps in your code the function just called multiple times? – Hi-Angel Jun 30 '15 at 08:31
  • Are you compiling statically ? I think you should see two calls when doing gcc -static – Sandhya Kumar Jun 30 '15 at 08:37
  • Thanks! If I understand it right, you are pointing out that the first readlink appearing in strace output is from some other program (perhaps linker/loader) and not being invoked from my actual executable. If possible, is there a way to wrap generically? i.e. I don actually care from where it is being called but I want to hook ALL system calls during an execution. – Sandhya Kumar Jun 30 '15 at 09:12
  • @SandhyaKumar I don't know a way, but I can tell you for sure that it wouldn't be easy. It would involve some hacking for `ld.so` which is the dynamic linker that is being called for apps startup. Although there are variables, like LD_PRELOAD, but these are being used by the linker for dynamic linking of an application that being started — the linker itself wouldn't use these; i.e. you can not with this variable override the function that being called by the dynamic linker itself. – Hi-Angel Jun 30 '15 at 09:22
  • But when I am compiling statically, should not I get past all the hurdles? I think ld.so will not even come into the picture perhaps? – Sandhya Kumar Jun 30 '15 at 09:26
  • @SandhyaKumar hm, you asking interesting questions ☺ Indeed, `ld.so` wouldn't be called. I never experimented with it, but it is right. That means that when the application starts another program comes in play. Articles calls it with abstract word `loader`, and I've never seen that being mentioned where the loader is — so I always thought that `ld.so` **is** the loader. But now I see, it isn't. Hm, let me do a little research… – Hi-Angel Jun 30 '15 at 09:37
  • I did some objdump and updated the question. Please see the "un-intercepted" fragment sitting in the binary which actually does the syscall 0x59 is 89 i.e. readlink – Sandhya Kumar Jun 30 '15 at 09:44
  • @SandhyaKumar well, I'm stuck, I can not find a single article that says what the application being called for statical binaries. The process look like this: shell calls `execve`, next kernel loads the binary into the memory, and calls an interpreter. For dynamic apps it is `ld.so`. What is it for statical binaries — is just like some kind of a secret, just nobody knows, nobody mention, and nobody researched. So weird… – Hi-Angel Jun 30 '15 at 10:23
  • With my limited knowledge of kernel, I think this is what happening (I also checked with trace from Hypervisor). The kernel just loads the binary into memory and transfers control to ELF entry point of static binary. There is no interpreter or anything coming into picture. The binary begins execution [ _start -> __libc_start_main -> __libc_init_first -> _dl_non_dynamic_init -> _dl_get_origin]. The only missing important piece is why the syscalls (here in _dl_get_origin) are not getting hooked up with new wrappers while binary is being prepared by GCC. I ll wait for any GCC expert to comment. – Sandhya Kumar Jun 30 '15 at 11:19
  • This is quite similar topic. Posting here as you might be familiar. How do you wrap cases like sprintf() ? Note that it has variable number of arguments. – Sandhya Kumar Jul 03 '15 at 11:47
  • @SandhyaKumar it's impossible as a [restriction of the language](http://c-faq.com/varargs/handoff.html). – Hi-Angel Jul 03 '15 at 12:35