5

I was reading a codebreakers journal article on self-modifying code and there was this code snippet:

void Demo(int (*_printf) (const char *,...))
{ 
      _printf("Hello, OSIX!n"); 
      return; 
} 
int main(int argc, char* argv[]) 
{ 
  char buff[1000]; 
  int (*_printf) (const char *,...); 
  int (*_main) (int, char **); 
  void (*_Demo) (int (*) (const char *,...)); 
  _printf=printf; 
  int func_len = (unsigned int) _main ­- (unsigned int) _Demo; 
  for (int a=0; a<func_len; a++) 
    buff[a] = ((char *) _Demo)[a]; 
  _Demo = (void (*) (int (*) (const char *,...))) &buff[0]; 
  _Demo(_printf); 
  return 0; 
}

This code supposedly executed Demo() on the stack. I understand most of the code, but the part where they assign 'func_len' confuses me. As far as i can tell, they're subtracting one random pointer address from another random pointer address.

Someone care to explain?

jalf
  • 243,077
  • 51
  • 345
  • 550
Gogeta70
  • 881
  • 1
  • 9
  • 23
  • 1
    Could you link to the article? – BlueRaja - Danny Pflughoeft Apr 26 '11 at 06:08
  • 3
    The code as posted is full of mistakes. The idea seems to be to copy the machine code from Demo into buff then execute it from there, but that assumes the opcodes are relocatable (a dangerous assumption, may require a compiler flag for position independent code). `fun_len` was presumably means to be `_main - _Demo`, as a max for the size of the `Demo` function. Still, it copies from _Demo before it assigns it to address Demo, so it doesn't have a hope. It also risks alignment issues as buffer may not be aligned as per the function. – Tony Delroy Apr 26 '11 at 06:13
  • 1
    I don't have a link to the article, it's a PDF file on my computer. I'll upload it to mediafire: http://www.mediafire.com/?8zslfj6fjsgcsxd – Gogeta70 Apr 26 '11 at 06:13
  • 2
    Sorry to be so blunt, but... Whoever wrote that code is a dope. Plain and simple. Don't write code like this. – asveikau Apr 26 '11 at 06:32

2 Answers2

8

The code is relying on knowledge of the layout of functions from the compiler - which may not be reliable with other compilers.

The func_len line, once corrected to include the - that was originally missing, determines the length of the function Demo by subtracting the address in _Demo (which is is supposed to contain the start address of Demo()) from the address in _main (which is supposed to contain the start address of main()). This is presumed to be the length of the function Demo, which is then copied byte-wise into the buffer buff. The address of buff is then coerced into a function pointer and the function then called. However, since neither _Demo nor _main is actually initialized, the code is buggy in the extreme. Also, it is not clear that an unsigned int is big enough to hold pointers accurately; the cast should probably be to a uintptr_t from <stdint.h> or <inttypes.h>.

This works if the bugs are fixed, if the assumptions about the code layout are correct, if the code is position-independent code, and if there are no protections against executing data space. It is unreliable, non-portable and not recommended. But it does illustrate, if it works, that code and data are very similar.

I remember pulling a similar stunt between two processes, copying a function from one program into shared memory, and then having the other program execute that function from shared memory. It was about a quarter of a century ago, but the technique was similar and 'worked' for the machine it was tried on. I've never needed to use the technique since, thank goodness!

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Right. I understand how the code works. What i didn't understand is why the original author of the code was subtracting one uninitialized pointer from another uninitialized pointer. Still, the concept of the code got me where i wanted. I now have a working example that i wrote myself. For those that are interested, here it is: http://friendpaste.com/2B2NA1UyI8TDn0wXXCXEGH – Gogeta70 Apr 26 '11 at 06:53
5

This code uses uninitialized variables _main and _Demo, so it cannot work in general. Even if they meant something different, they probably assumed some specific ordering of functions in memory.

My opinion: don't trust this article.

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220