2

I'm writing a language that compiles to C right now, and when I say IL I mean as in C is the language I write the code as to then generate assembly by another c compiler, e.g. gcc or clang.

The C code I generate, will it be more beneficial to:

  • If I do some simple opt passes (constant propagation, dead code removal, ...) will this reduce the amount of work the C compiler has to do, or make it harder because it's not really human C code?
  • If I were to compile to say three-address code or SSA or some other form and then feed this into a C program with functions, labels, and variables - would that make it easier or harder for the C compiler to optimize?

Which kind of link together to form the following question...

  • What is the most optimal way to produce good C code from a language that compiles to C?
  • Is it worth doing any optimisations at all and leaving that to the compiler?
Jon Flow
  • 495
  • 3
  • 11

1 Answers1

1

Generally there's not much point doing peephole type optimisations because the C compiler will simply do those for you. What is expensive is a) wasted or unnecessary "gift-wrapping" operations, b) memory accesses, c) branch mispredictions.

For a), make sure you're not passing data about too much, because whilst C will do constant propagation, there's a limit to how far it can detect that two buffers are in fact aliases of the same underlying data. For b) try to keep functions short and operations on the same data together, also limit heap memory use to improve cache performance. For c), the compiler understand for loops, it doesn't understand goto loops. So it will figure that

for(i=0;i<N;i++) 

will usually take the loop body, it wont figure that

if(++i < N) goto do_loop_again 

will usually take the jump.

So really the rule is to make your automatic code as human-like as possible. Though if it's too human-like, that raises the question of what your language has to offer that C doesn't - the whole point of a non-C language is to create a spaghetti of gotos in the C source, a nice structure in the input script.

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18
  • "_the whole point of a non-C language is to create a spaghetti of gotos_" is nonsense! I wrote languages with yacc that generate perfectly good C without goto. – Paul Ogilvie Oct 01 '16 at 14:26
  • Can you elaborate on why those operations are expensive? Do you think these can be optimized out if someone were to write bad code before it's compiled to C? I don't think I could limit heap memory use since this language is relatively low level, no GC or anything so the programmer handles their memory management. – Jon Flow Oct 01 '16 at 14:44
  • Reading and writing to main memory involves passing data out of the processor and back in, much more expensive than operations that take place inside the processor. By arranging data so the items used together are close together, and operations so that operations that act on the same data are close together, you minimise that. – Malcolm McLean Oct 01 '16 at 16:34
  • If the output of the compiler is well-structured C,, what is it adding? Wouldn't it be easier for the user to simply write C? – Malcolm McLean Oct 01 '16 at 16:36
  • Not quite true - loop analysis is done when the procedural loops are long gone. Loop analysis is done on a CFG level, and there is no difference between a `goto` and a procedural loop. – SK-logic Oct 02 '16 at 11:48
  • Sorry I didn't mark this as the answer earlier enough, had some computer issues. Anyhow, this seems to clear everything up for me, thanks! – Jon Flow Oct 02 '16 at 17:36