4

Consider the following x86 code example:

#include <stdlib.h>

static int i;

static inline __attribute__((always_inline)) test(int x)
{
    asm volatile("mov %1, %0" : "=r"(i): "i"(x));
}

int main(void)
{
    test(5);

    return i;
}

If I build it with:

gcc -O test.c

It builds fine.

If I build it with (no optimization):

gcc test.c

It fails during assembly phase, because the value '5' is not propagated as an immediate value to the inline function test, so we fail the constraint.

I wish to be able to compile this code without turning on other non related optimization, in order to make debugging easier.

In theory, -O is simply a short cut to enable a bunch of GCC optimization options at once that are documented in the fine GCC manual. Unfortunately, I was not able to find the specific GCC flag that turns this behavior on.

Any ideas?

Clarification: To relieve any doubt, the code snippet is just an example. It does not make much sense by itself except to show what I am trying to do. The actual use case involves an instruction on a custom processor that can only take an immediate as an argument which I am trying to wrap in a C construct. A macro will indeed do the trick, but suffers from all the usual draw backs of a macro, hence I am trying to avoid it.

Update: For those who wondered, a macro wont work either. It seems the inline function doesn't play a part here at all. E.g., this doesn't work either:

void foo (void)
{
  int i = 6;

  asm volatile ("" : : "i" (i));
}

I also fixed the question title to reflect this.

gby
  • 14,900
  • 40
  • 57
  • Is c support inline ? If yes, which version are you using ? You can, almost for all case, do anything, done with inline, with macro. And, what are you expect after optimization ? ( there is no miracle for such a small code segment ) –  Jul 17 '12 at 07:59
  • @BasileStarynkevitch reason : **I wonder** –  Jul 17 '12 at 08:04
  • @gcc C supports inline since the C99 version of the standard. GCC actually supports C inline function long before the standard. The optimization here is obvious - the number of instruction to do a function call vs. a single instruction in the inline case. – gby Jul 17 '12 at 08:21
  • If the real use case is a specialized instruction on some custom processor, you really should consider making it a builtin (as suggested in my answer). This is a typical use for builtins. – Basile Starynkevitch Jul 17 '12 at 08:29

3 Answers3

7

Looks like -ftree-ter (Replace temporary expressions in the SSA->normal pass - whatever that is) does the trick:

gcc -ftree-ter test.c   # no errors

Here's how I determined that:

  1. gcc -Q --help=optimizers tells you what optimizations are enabled/disabled by default (some are enabled)
  2. gcc -O -Q --help=optimizers tells you what optimizations are enabled/disabled for -O
  3. redirect the output of those commands to files and diff them.
  4. try the optimizations that are enabled only when -O is specified until one works.
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Thank you for your answer. I have tried -ftree-ter, as well as all other optimization options documented in the GCC manual to be enabled by -O and none of them did the trick. In fact, I also diffed the output of "gcc -Q --help=optimizer" for -O and without and tried all the options that came up in the diff together (which should be identical to passing -O1) and even that did not work... – gby Jul 17 '12 at 08:31
  • @gby: hmm. I tested on GCC 4.6.1. It is a MinGW distribution on Windows, in case that accounts for the difference (the TDM x86-64 distribution). The `-v` option shows that GCC is also using the `-mtune=generic` and `-march=x86-64` options from the built-in specs, which may or may not be a factor. – Michael Burr Jul 17 '12 at 08:43
  • interesting. I tested this on x86_32 4.4.3 (Ubuntu 4.4.3-4ubuntu5) version and there is does now work but -O1 does. – gby Jul 17 '12 at 08:59
  • From googling -ftree-ter it seems this is indeed the right flag. Now I just need to figure out why it doesn't work on my version... thanks! – gby Jul 17 '12 at 10:38
  • @gby: Maybe in GCC 4.4.3 there needs to be two or more `-f` options enabled in addition to `-ftree-ter`? Maybe try approaching from the other direction - pass in *all* the `-f` options that `-O` enables (hopefully that'll make the compile go OK), then start removing them until you get to the set where no more can be removed. – Michael Burr Jul 17 '12 at 14:25
  • I tried. Even when passing all the options it still did not work but it did when passing -O. Strange indeed. – gby Jul 17 '12 at 17:33
  • @gby: strange - maybe the spew from adding `-v` with and without `-O` might give a clue? – Michael Burr Jul 17 '12 at 18:20
2

always_inline is a strange attribute, very GCC specific, and possibly GCC version specific (so the detailed behavior might not be the same with GCC 4.5 and with GCC 4.7).

GCC is working by running a lot of optimization passes (even in -O0 some of these passes are running, otherwise no code would be emitted). Typically a GCC -O1 compilation is running two hundred optimization passes.

With gcc-4.7 your code don't even compile in -O0:

alw.c: In function ‘main’:
alw.c:7:5: warning: asm operand 1 probably doesn’t match constraints [enabled by default]
alw.c:7:5: error: impossible constraint in ‘asm’

To understand more what GCC is doing, you could run it with gcc -fdump-tree-alland you'll get a so called "dump file" (a textual representation of some of the internal representations transformed by a pass) for most GCC passes. Beware, you'll get hundred[s] of such dump files (and sadly, the number inside the name of dump files is not significant).

I can't understand why you want to do that. I suggest either making your test a macro, or always optimize (recent GCC deal quite well with both -g and -O1).

A possible alternative could be to extend GCC with a plugin, or better, a MELT extension (MELT is a high level domain specific language to extend GCC, implemented as a GPLv3 licensed GCC plugin). Then you could make your test function your own GCC builtin, since GCC can be extended to add builtins and pragmas. Your extension will then install your specific builtins and insert some specific passes to handle them appropriately. (This means several days of work, even if you know well GCC internals). Notice that builtins are commonly used to interface extra target processor specific instructions (just like your use case).

Recent GCC (notably 4.6 and 4.7) accept plugins (if they have been configured with --enable-plugins). Check with gcc -v if your particular GCC is accepting plugins. Some distributions dislike the GCC plugin idea (e.g. Suse & perhaps Redhat) so don't contain a GCC accepting plugins.

If your particular Linux distribution (a recent one) does not support yet GCC plugins, I suggest you to open a bug report to request plugins to be enabled inside GCC. If your GCC cross-compiler supplier don't support plugins, I also suggest you to query that feature, which exists in the FSF GNU Gcc since several years, e.g. since 4.5!

On Debian or Ubuntu, I suggest installing the gcc-4.6-plugin-dev or gcc-4.7-plugin-dev package. You'll then be able to build and use the MELT plugin (I am working to release MELT 0.9.6 for GCC 4.6 & 4.7 very soon, i.e. in july 2012).

Most recent distributions (Debian, Ubuntu, Mandriva, ArchLinux, ...) with a GCC 4.6 or 4.7 have a GCC extensible with plugins. MELT is such a plugin (it is a meta plugin, because melt.so is doing itself some dlopen). Once you have a GCC accepting plugins and have installed some plugin (e.g. MELT), using it is just running gcc -fplugin=melt with some other plugin specific options (e.g. -fplugin-arg-melt-mode=your-melt-mode for MELT).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • The fact that gcc cannot assemble my code with optimization disabled is mentioned in the my port (gcc test.c and gcc -O0 test.c are one and the same). It is not unique to GCC version 4.7 - it is in fact what I am asking about. What I am trying to do is wrap a instruction of a custom processor that must receive an immediate operand as a C level construct. A macro will indeed work, but suffers from all the known draw backs of macros, hence I try to avoid it. – gby Jul 17 '12 at 08:27
  • Using a builtin rather then an inline asm is a very good idea. However, using a MELT extension is problematic for me, because it requires a non mainline branch of GCC, unless I've missed something, which will require me to extend GCC proper which is, alas, quite difficult :-) – gby Jul 17 '12 at 10:37
  • No, MELT (or any other plugin) is usable on any GCC with plugins enabled. You don't need to alter a single bit of GCC to use MELT (or any other GCC plugin). You just need a GCC with plugins enabled. Check that with `gcc -v` (plugins for GCC 4.6 & 4.7 are enabled on Debian & Ubuntu & Mandriva, but Suse & Redhat are perhaps not enabling them yet). On debian or ubuntu, install the `gcc-4.6-plugin-dev` package for `gcc-4.6` etc... – Basile Starynkevitch Jul 17 '12 at 10:39
2

Probably you are just abusing the "i" constraint. If you don't optimize there is no way for the compiler to "know" that this will be an immediate at the end.

I think you just should let gcc do the work to decide how much this can be optimised. I'd just use "g" as a constraint instead of "i". I am quite sure that when you compile with optimization on, everything will resolve fine to an immediate. But you'd better check the assembler that is produced to be sure.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • To help the compiler perhaps its a good idea to declare `i` `const`? – orlp Jul 17 '12 at 08:17
  • Thank you for you answer. The code example indeed does not make sense by itself. However, my actual use involves an instruction of a custom processor that MUST take an immediate as an operand, hence the requirement of the an "i" constraint. Hope this explains what I am trying to do better. – gby Jul 17 '12 at 08:26
  • 2
    But then an `inline` function is not the correct tool, since this must be somehow agnostic of where the function parameter comes from. I'd go for a macro that I'd protect with `__builtin_constant_p` against uses with non-constant expressions. – Jens Gustedt Jul 17 '12 at 09:31
  • Actually, it seems a macro (or even hand coding the inline asm in place) runs into the same issue. – gby Jul 17 '12 at 10:39