What calling convention should I use to make things portable?

Question

I am writing a C interface for CPU's cpuid instruction. I'm just doing this as kind of an exercise: I don't want to use compiler-depended headers such as cpuid.h for GCC or intrin.h for MSVC. Also, I'm aware that using C inline assembly would be a better choice, since it avoids thinking about calling conventions (see this implementation): I'd just have to think about different compiler's syntaxes. However I'd like to start practicing a bit with integrating assembly and C.

Given that I now have to write a different assembly implementation for each major assembler (I was thinking of GAS, MASM and NASM) and for each of them both for x86-64 and x86, how should I handle the fact that different machines and C compilers may use different calling conventions?

You don't need separate GAS and NASM implementations; that seems pointless since both GAS and NASM can make object files for every mainstream platform. (Except I'm not sure about GAS for macos's MachO, but the native clang assembler there can assemble .s files). Since you need to take args and return four `int` values (or a struct), yes you need to handle arg-passing differences. (And also return values for a 16-byte struct, unless you take an output pointer arg). EBX is call-preserved everywhere, EAX, ECX, and EDX are call-clobbered everywhere. — Peter Cordes, Mar 05 '22 at 11:07
See Agner Fog's calling convention guide on https://agner.org/ — Peter Cordes, Mar 05 '22 at 11:07
@PeterCordes hi, thanks. I'm on MacOs: yes, GAS supports MachO. Or at least MachO64. That's because GCC on MacOs builds on clang. — Giuppox, Mar 05 '22 at 11:34
`gcc foo.s` doesn't run GAS on MacOS, it runs clang with LLVM's built-in assembler. GAS is the [**GNU** assembler](https://sourceware.org/binutils/docs/as/) - if `as --version` doesn't print `GNU assembler (GNU Binutils) 2.36.1` or similar, it's not the GNU assembler, but rather some other program that has compatible options and accepts (almost) the same syntax. That's why I said it doesn't matter if the GNU assembler itself is actually portable to MacOS, because a compatible assembler based on LLVM is already installed. — Peter Cordes, Mar 05 '22 at 11:54
@PeterCordes There seems to be at least 4 calling conventions that I should implement. Isn't there a way to just push paramenters to the stack regardless of the platform and always do that? — Giuppox, Mar 05 '22 at 12:03
@Giuppox One thing you can do is write macros for each calling convention that shuffle the arguments into a known location. Then, porting the code is just a matter of configuring which macro to use. — fuz, Mar 05 '22 at 12:12
You could do something like `__attribute__((ms_abi))` on the prototype for GNU C compilers to tell callers to always use the Windows x64 calling convention, so you know what registers the args will be in. (Or for 32-bit, that's probably stdcall or cdecl, not sure which.) That might limit things to one calling convention per bitness. If you don't want that, just use GNU C inline asm like cpuid.h does... there's a reason people use that instead of separate asm files, because it's not easier, just puts the complexity in different places. You're only doing this as a learning exercise, I guess? — Peter Cordes, Mar 05 '22 at 12:41
Or if you intend this for production use, I'd recommend just making a header that does some #ifdef checking to figure out which other header to include, and expose a single portable API in terms of existing headers, without separate asm files. All modern mainstream compilers have some header with CPUID wrappers. — Peter Cordes, Mar 05 '22 at 12:42
@PeterCordes hi, take a look at this example of x64 implementation I did. Have I missed something? https://gist.github.com/Giuppox/4a7a203130d7c78714b61aa0f75b8177 — Giuppox, Mar 09 '22 at 14:28
You don't need to save/restore RAX, RCX, or RDX; they're call-clobbered in all conventions. Also, for MS, `mov 8(%rcx), %ecx` destroys the pointer, so do it last. And the incoming EDX and EBX aren't inputs for CPUID, only EAX (and sometimes ECX). — Peter Cordes, Mar 09 '22 at 14:48
If you have any more questions like that, edit code into this question or ask a new one. — Peter Cordes, Mar 09 '22 at 14:50

score 3 · Accepted Answer · answered Mar 11 '22 at 03:23

If you really want to write, as just an exercise, an assembly function that "conforms" to all the common calling conventions for x86_64 (I know only the Windows one and the System V one), without relying on attributes or compiler flags to force the calling convention, let's take a look at what's common.

The Windows GPR passing order is rcx, rdx, r8, r9. The System V passing order is rdi, rsi, rdx, rcx, r8, r9. In both cases, rax holds the return value if it fits and is a piece of POD. Technically speaking, you can get away with a "polyglot" called function if it (0) saves the union of what each ABI considers non-volatile, and (1) returns something that can fit in a single register, and (2) takes no more than 2 GPR arguments, because overlap would happen past that. To be absolutely generic, you could make it take a single pointer to some structure that would hold whatever arbitrary return data you want.

So now our arguments will come through either rcx and rdx or rdi and rsi. How do you tell which will contain the arguments? I'm actually not sure of a good way. Maybe what you could do instead is have a wrapper that puts the arguments in the right spot, and have your actual function take "padding" arguments, so that your arguments always land in rcx and rdx. You could technically expand to r8 and r9 this way.

#ifdef _WIN32
#define CPUID(information) cpuid(information, NULL, NULL, NULL)
#else
#define CPUID(information) cpuid(NULL, NULL, NULL, information)
#endif

// d duplicates a
// c duplicates b
no_more_than_64_bits_t cpuid(void * a, void * b, void * c, void * d);

Then, in your assembly, save the union of what each ABI considers non-volatile, do your thing, put whatever information you want in the structure to which rcx points, and restore.

Yeah, that CPP wrapper should work, although of course it creates more work at the call-site to pass more args. You could define a different prototype, too, only using 1 arg on _WIN32, since the compiler doesn't need to zero RDX, R8, or R9 for the same asm to work. (You could maybe use the RDX arg to strike a balance of 1 vs. 2 dummy args on MS vs. SysV, instead of 0 vs. 3). It would be nice to have the arg in a register that's not one of EAX..EDX which CPUID writes, but that only saves one `mov` in the asm vs. making each callsite use more instructions. — Peter Cordes, Mar 11 '22 at 04:10
there's also the [Plan 9 calling convention](https://stackoverflow.com/a/20637866/995714) that's also used in golang: All registers are caller-saved, All parameters are passed on the stack, Return values are also returned on the stack. Go has just recently [moved to another register-based calling convention](https://dr-knz.net/go-calling-convention-x86-64-2020.html) — phuclv, Mar 11 '22 at 09:21
x86 would be easier to do, in my opinion, as long as it's `cdecl`, because then the calling convention would actually be effectively the same across Windows, Linux, and macOS (to my knowledge). — Mona the Monad, Mar 21 '22 at 12:58

What calling convention should I use to make things portable?

1 Answers1