5

Is self-modifying code possible in a portable manner in C?

The reason I ask is that, in a way, OOP relies on self-modifying code (because the code that executes at run-time is actually generated as data, e.g. in a v-table), and yet, it seems that, if this is taken too far, it would prevent most optimizations in a compiler.

For example:

void add(char *restrict p, char *restrict pAddend, int len)
{
    for (int i = 0; i < len; i++)
        p[i] += *pAddend;
}

An optimizing compiler could hoist the *pAddend out of the loop, because it wouldn't interfere with p. However, this is no longer a valid optimization in self-modifying code.

In this way, it seems that C doesn't allow for self-modifying code, but at the same time, wouldn't that imply that you can't do some things like OOP in C? Does C really support self-modifying code?

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 2
    C has function pointers, which is all you need to build a run-time dispatch mechanism. You do not need "self-modifying code". – Nemo Jun 18 '11 at 21:31
  • I would not say that the OOP implementation in (say) C++ uses self-modifying code, in my opinion, it's just data-driven code. Virtual functions are nothing conceptually really different from a simple `switch`. – Vlad Jun 18 '11 at 21:33
  • @Nemo: Isn't that still a form of self-modifying code? – user541686 Jun 18 '11 at 21:36
  • You may be confused what a vtable is. It's a compile-time generated array of function pointers, although whether it's actually populated at compile time or link-time is of course up to the implementation, it's bound to be link-time if your executables are rebaseable. It's no more "self-modifying code" than this is: `int (*)(int) funcarray[2] = { toupper; tolower; }`. Or, if you want to say that calling through a function pointer is "self-modifying code", then it's simply false that "self-modifying code" necessarily prevents that optimization being valid. – Steve Jessop Jun 18 '11 at 21:37
  • 1
    @Mehrdad: "Self-modifying code" normally refers to changing the actual machine instructions, but this is really a question of definitions. For example, the optimization example you give _is_ valid in the presence of function pointers. So what do you mean by "self-modifying code", exactly? – Nemo Jun 18 '11 at 21:40
  • @Nemo: But the trouble with my example is, it would produce a different result if it was modifying its own instructions -- so the dereferencing of `pAddend` would actually need to take place on every iteration. – user541686 Jun 18 '11 at 21:46
  • @Steve: I'm slightly confused at what you're trying to say, sorry... would you mind elaborating? – user541686 Jun 18 '11 at 21:47
  • Which part don't you understand? That a vtable is an array of function pointers, and need not contain any binary code? That this table is populated prior to runtime? That function pointers don't make your optimization impossible? It adds up to: standard C *does* provide means to implement vtables; it *doesn't* provide a means to implement what is usually described as self-modifying code; these two facts are not inconsistent. – Steve Jessop Jun 18 '11 at 21:52
  • 1
    @Mehrdad: You are saying contradictory things. "My example breaks on self-modifying code" + "functions pointers are a form of self-modifying code" = you do not actually know what you mean by self-modifying code. – Nemo Jun 18 '11 at 21:52
  • @Nemo: Confused. I didn't say my example breaks on *all* self-modifying code. (Heck, if the modification was just to redirect the instruction to something else that did the exact same thing, then it wouldn't break, would it?) And *not all* uses of function pointers are self-modifying code. I'm not understanding where the contradiction is... – user541686 Jun 18 '11 at 21:55
  • @Mehrdad: It is not possible to use function pointers to break your example. (Indeed, standard C does not allow anything that breaks your example.) I am still not convinced that you have a clear definition in your head for "self-modifying code". The only hint you gave about what you mean is an example you say would break. In which case the answer is no, C does not allow _that_ kind of self-modifying code. Yet it does allow the implementation of vtables... You need to say what you mean instead of giving contradictory examples. – Nemo Jun 18 '11 at 21:59
  • @Nemo: It seems like I was indeed confused about something -- see my reply to R.'s answer below. – user541686 Jun 18 '11 at 22:00
  • Even if C did support self-modifying code, I'm at a loss as to what in the example would prevent the compiler from hoisting `*pAddend` out of the loop. If you think the compiler would need to be concerned about the `add()` function itself being modified, then there's nothing the compiler could do about that, whether it optimized or not. Whatever it generated would be subject to being changed even if it didn't hoist `*pAddend`. – Michael Burr Jun 18 '11 at 23:42

2 Answers2

8

Self-modifying code is not possible in C for many reasons, the most important of which are:

  1. The code generated by the compiler is completely up to the compiler, and might not look anything like what the programmer trying to write code that modifies itself expects. This is a fundamental problem with doing SMC at all, not just a portability problem.
  2. Function and data pointers are completely separate in C; the language provides no way to convert back and forth between them. This issue is not fundamental, since some implementations or higher-level standards (POSIX) guarantee that code and data pointers share a representation.

Aside from that, self-modifying code is just a really really bad idea. 20 years ago it might have had some uses, but nowadays it will result in nothing but bugs, atrocious performance, and portability failures. Note that on some ISAs, whether the instruction cache even sees changes that were made to cached code might be unspecified/unpredictable!

Finally, vtables have nothing to do with self-modifying code. It's purely a matter of modifying function pointers, which are data, not code.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    +1 your last sentence is key. For some reason I thought that an indirect instruction like `jmp EAX` modifies itself when `EAX` changes... silly error in thinking. Thanks for the answer. – user541686 Jun 18 '11 at 21:59
  • 1
    This is not true. Look into page protection mechanisms on POSIX and WinAPI. Nothing prevents you (except code signing in iOS kernel and similar) from generating machine code at runtime, setting the page protection flag to EXEC and passing control into it using a C-style function pointer. – Sergey K. Oct 11 '15 at 13:41
3

Strictly speaking, self-modifying code cannot be implemented in a portable manner in C or C++ if I understood the standard correctly.

Self modifying code in C/C++ would mean something like this:

uint8_t code_buffer[FUNCTION_SIZE];
void call_function(void)
{
   ... modify code_buffer here to the machine code we'd like to run.
   ((void (*)(void))code_buffer)();
}

This is not legal and will crash on most modern architectures. This is impossible to implement on Harvard architectures as executable code is strictly read-only, so it cannot be part of any standard.

Most modern OSes do have a facility to be able to do this hackery, which is used by dynamic recompilers for one. mprotect() in Unix for example.

Maister
  • 4,978
  • 1
  • 31
  • 34
  • 1
    And *self*-modifying code is another kettle of fish again from code that writes "fresh" code and executes it. As Mehrdad observes, modifying compiler-generated code is quite difficult if you don't know how that code was generated/optimized in the first place, since the machine instructions don't necessarily bear any particularly obvious relationship to the AST. – Steve Jessop Jun 18 '11 at 21:43
  • While you couldn't write "self modifying C" it is certainly possible for a C compiler to emit "self modifying machine code". I think the question allows for both, whether the OP had both in mind I don't know, whether any C compiler ever did this I also don't know. – hippietrail Oct 24 '12 at 14:43