2

Is there anyway to force a C function on clang to be optimized even when the file is compiled with -O0?

I'm looking for something equivalent to gcc's __attribute((optimize("s")) or __attribute((optimize(3)).

(Related: In clang, how do you use per-function optimization attributes?)


What I'm trying to do is generate certain functions in almost pure assembly via a macro—the remaining C code in there shouldn't generate any assembly code. Ideally, the macro would use C-based integer constant expressions to choose which code to paste and writing static before it would make the generate function static. I also want no stack manipulation in the function's prologue.

On GCC something like:

enum { CONSTANT = 0 };
__attribute((optimize("Os"),noinline,noipa))
int foo(void){
    if (CONSTANT) asm("mov $1, %eax; ret;");
    else asm("xor %eax, %eax; ret;");
    __builtin_unreachable();
}

gets the gist of it successfully. On clang, the optimize attribute is unrecognized and a push %rbp; mov %rsp, %rbp prologue is generated which would break my real use case, as well as the ret in this toy example, so it's most undesirable.

On GCC, __attribute((naked)) also works to eliminate the prologue and disable inlining and Inter-Procedural Analysis (IPA), but clang hard-rejects it, enforcing the requirement that naked functions should only consist of pure assembly (no nongenerating C code, even).

Per the GCC docs for x86 function attributes:

naked

This attribute allows the compiler to construct the requisite function declaration, while allowing the body of the function to be assembly code. The specified function will not have prologue/epilogue sequences generated by the compiler. Only basic asm statements can safely be included in naked functions (see Basic Asm). While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.

While not supported, it was working well enough for my use-case. The hack with __attribute__((optimize("Os"),noinline,noipa)) is even more hacky but does in fact compile to the asm I want with current GCC. I'd like to do something similar with clang.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • It was proposed many times, but never implemented – 0___________ Mar 14 '23 at 13:44
  • 1
    As a workaround, could you move that function in a separate file that is always compiled with optimization enabled, while the rest of your code could be compiled without optimization? – Gerhardh Mar 14 '23 at 13:45
  • 1
    How about you put the selector and the alternatives into three separate functions, with the latter two marked with `__attribute((naked))` that you say works? – Jester Mar 14 '23 at 13:49
  • I think you should forget clang. – 0___________ Mar 14 '23 at 13:50
  • 1
    @Jester Thanks, but that doesn't work for me, unfortunately (In reality, I have many branches, and I don't want to generate all that dead code). I think my workaround for now will be detecting -O0 via a macro and adjusting behavior accordingly. – Petr Skocik Mar 14 '23 at 14:00
  • 1
    I do not suppose `CONSTANT` could be a preprocessor symbol (`#define CONSTANT 0`)? – nielsen Mar 14 '23 at 14:10
  • 1
    You absolute need `__attribute__((noinline))` if you're going to consider this kind of abuse of C. And BTW, `push %rbp` *would* break the `ret` in your example. – Peter Cordes Mar 14 '23 at 14:15
  • @nielsen The integer constants I'm actually branching on are based on C types, so no. It can't be a preprocessor constant. – Petr Skocik Mar 14 '23 at 14:24
  • If Jester's answer doesn't work for you (because of a combinatorial explosion of versions of the whole function since you mention multiple branches), you should definitely consider adding a step to your build system to get these constants as CPP macros so you can do this in a well-defined way. – Peter Cordes Mar 14 '23 at 14:49

4 Answers4

3

How about you put the selector and the alternatives into three separate functions, with the latter two marked with __attribute((naked)) that you say works? Something like this:

enum { CONSTANT = 0 };
__attribute((naked))
int foo1(void){
    asm("mov $1, %eax; ret;");
}
__attribute((naked))
int foo0(void){
    asm("xor %eax, %eax; ret;");
}
int foo(void){
    if (CONSTANT) return foo1();
    else return foo0();
}
Jester
  • 56,577
  • 4
  • 81
  • 125
  • The OP said they're trying to avoid having copies of dead code in the binary. That's difficult with this strategy, unless the callers are all in one file so unused `static foo1` or `foo0` functions can be optimized away. But you don't want to duplicate `foo1` for every compilation unit that calls it, either. So this might need LTO to remove the unused functions. – Peter Cordes Mar 14 '23 at 14:42
  • Also the OP mentioned multiple branches, so this strategy might result in a combinatorial explosion of functions. Since the non-naked function can only call whole functions, not paste together snippets of assembly to make one function body. GCC definitely doesn't officially support the OP's original code, it's just an abuse of the compiler that happens to work, so it's not surprising that this is tricky. That's a job for the C preprocessor. – Peter Cordes Mar 14 '23 at 14:47
  • 1
    Thanks. When the branch functions are marked static, the compilers can easily eliminate the dead ones, so the only extra codesize cost is an extra `jmp` due to noninlinability of naked functions: https://godbolt.org/z/raaq9eqMc. – Petr Skocik Mar 17 '23 at 10:19
1

Jester's answer is probably good for simple-enough cases if you can manually create every combination of asm blocks you might need. If they're only used in one compilation unit, they can be static to let the unused ones optimize away.

But you do want the non-inline version to be visible for inlining, so you don't get an extra jmp tailcall on every call, so all the callers have to be in the same compilation unit.

If that's not viable, link-time optimization should let the unused versions optimize away and not bloat your binary.


If you have many different branches that would lead to too large a combinatorial explosion of possibilities to maintain, you should definitely consider adding a step to your build system to get these constants as CPP macros so you can do this with #if or #ifdef around multiple asm(""); statements in a naked function in a well-defined way.

What you're doing now with non-naked functions is a horrible abuse of the compiler that's not at all supported, merely happens to work.

if(constant) inside a naked function is also not officially supported, but seems to me like something that's less likely to break, as long as the constants are truly compile-time constant expressions. Still, no guarantees, unlike if you use the C preprocessor to just paste text together.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Jester's strategy coupled with making `foo{0,1}` static is actually viable. The dead function gets deleted fine. The `jmp` seems unavoidable on clang. Looks like neither compiler can inline naked funcs which kind of makes sense (interestingly clang even appears to be marking them noinline implicitly & that impl. detail is leaking in warnings if you also add `inline`/`__attribute((always_inline))` on top of `__attribute((naked))`: https://godbolt.org/z/h4jY34c7h). (On GCC, the nonsactioned-but -working C-branching inside a naked function can be used to avoid the jump.). Thanks for the help! – Petr Skocik Mar 17 '23 at 10:18
  • (Coming to think of it, even a naked function that isn't no_reorder could theoretically be inlined fine when it's being tail-called into.) – Petr Skocik Mar 17 '23 at 10:29
  • @PSkocik: Fall-through tailcall isn't something I'd count as inlining, just a layout (missed) optimization that allows removing the `jmp` but doesn't allow anything about knowing each others internals and optimizing the calling-convention / ABI stuff or anything like that. Inlining is something that happens at the C or GIMPLE / LLVM-IR levels, not like an asm macro. – Peter Cordes Mar 17 '23 at 10:41
  • @PSkocik: `static` works nicely if the only callers of `foo` (and thus of `foo0` or `foo1`) are all in one compilation unit. Otherwise (if you needed these definitions in a `.h`) it would lead to duplication of the function body for each compilation unit that used one. That's what I was trying to say in this answer. If only `foo`'s definition is available to actually inline, declarations of the others need to be visible so they can't be `static`. But if `foo` itself doesn't inline, then it compiles to a single `jmp` instruction. (Unless it gets the same address as the target.) – Peter Cordes Mar 17 '23 at 10:43
0

At least one attempt to implement this in clang was abandoned.

I think the only way is to put the function in a file by itself and compile that file with the optimization you want.

Building the function with -O2 does get rid of the prologue (see here).

nielsen
  • 5,641
  • 10
  • 27
0

Here's what I think is probably my most flexible solution to this so far with zero extra generated code in optimized builds:

  1. use a non-naked func with inline assembly blocks intermixed with C
  2. don't try to avoid prologues but undo them if they're generated

Step two, encapsulated by a MAYBE_DELETE_FRAME() macro, which is to be used at the very beginning of such a pseudo-naked function, assumes that:

  1. any possible prologue is a frame setup (can be undone by the "leave" instruction)
  2. no frames are set up in optimized builds*
  3. a macro is defined by the build to distinguish nonoptimized and optimized builds

(*the default for optimized codegen on x86-64 SysV ABI unless VLAs/allocas or inline assembly with rsp clobbers are used)

#if NO_OPTIMIZATION /*build system should set it IFF -O0*/
    #define MAYBE_DELETE_FRAME_FOR(FUNC_NAME) __asm("leave;")
#else
    #define MAYBE_DELETE_FRAME_FOR(FUNC_NAME) /**/
#endif

A version of the macro could be defined regardless of optimization config by measuring the distance from a function start to the first user-issued assembly instruction. If it is find to be nonzero and then it's statically asserted that it is 4 (only push %rbp; mov %rsp, %rbp; prologues are expected) and leave is generated, otherwise nothing is generated:

#define MAYBE_DELETE_FRAME_FOR(FUNC_NAME) __asm(\
        ".if .-" #FUNC_NAME "\n" \
            ".if .-" #FUNC_NAME "!= 4\n" \
                ".err\n" \
            ".endif\n" \
            "leave\n" \
        ".endif\n" \
    )

Unfortunately, this more foolproof version of the macro again fails on clang, due to clang not considering the .-FUNC_NAME label subtraction to be an absolute expression (Interestingly, it does consider it to be an absolute expression in an equivalent *.s file. I think this discrepancy is a clang bug: https://github.com/llvm/llvm-project/issues/62520).

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142