13

I've got a compiler written with LLVM and I'm looking to up my ABI compliance. For example, I've found it hard to actually find specification documents for C ABI on Windows x86 or Linux. And the ones I have found explain it in terms of RAX/EAX/etc, rather than IR terms that I can use.

So far, I think I've figured that LLVM treats aggregates invisibly- that is, it considers their members as a distinct parameter each. So for example, on Windows x64, if I want to handle an aggregate like the document says, I'll need to coerce to a single integer of that size, if 8, 16, 32, or 64 bits. Otherwise, pass by pointer.

For Windows x86, it seems like __cdecl and __stdcall don't need any action from me as all parameters are passed on the stack. __fastcall says that the first two 32bit or smaller arguments are register-passed so I'll need to coerce aggregates of that size or less. __thiscall passes this in a register, and the rest on the stack, so it seems like I won't need to perform any adjustment here.

For __vectorcall, pass aggregates not more than sizeof(void*) by integer coercion. For other aggregates, if they are HVAs then pass by value; else pass by value on x86 or pass by pointer on x64.

This seems simple (well, relatively), but the LLVM docs for sext clearly state "This indicates to the code generator that the parameter or return value should be sign-extended to the extent required by the target’s ABI (which is usually 32-bits) by the caller (for a parameter) or the callee (for a return value).". The Microsoft pages for the x86 calling conventions mention nothing about extending anything to any width.

And I've observed the LLVM IR generated by Clang that generates the byval attribute on Windows. The understanding I've gleaned from the above never calls for byval's usage.

How would I lower the various platform C ABIs to LLVM IR?

trent
  • 25,033
  • 7
  • 51
  • 90
Puppy
  • 144,682
  • 38
  • 256
  • 465
  • In LLVM IR you _don't_ explicitly shove things in registers or the stack before a call. Instead you annotate the LLVM-IR call and function with one of several calling conventions. See http://www.llvm.org/docs/LangRef.html#calling-conventions – Iwillnotexist Idonotexist Aug 04 '14 at 22:27
  • 1
    Yes, but those are not enough to meet the target C ABI. You have to handle that yourself. Hence the question. – Puppy Aug 04 '14 at 22:28
  • I take it you tried the `inreg` parameter attribute? – Iwillnotexist Idonotexist Aug 04 '14 at 23:56
  • I haven't seen `inreg` generated by Clang, so I'm going with "That's not necessary". – Puppy Aug 05 '14 at 08:27
  • 2
    Also, considering the massive undefined behaviours going on here, "Try it" isn't really a viable attitude. I've already been "trying it" with a totally incorrect understanding and it seems to work at least half the time. It's not like you get a clear LLVM error if you violate the ABI. – Puppy Aug 05 '14 at 12:50

2 Answers2

6

I can't say I understand your question 100%, but it's worth noting that LLVM IR simply can not represent all the subtleties of platform ABIs. Therefore, in the Clang toolchain, it is the frontend that's responsible for performing ABI lowering, such as properly passing objects by value to functions, etc.

Take a look at lib/Basic/Targets.cpp in the Clang source tree for the definitions. The gory details are further in lib/CodeGen/TargetInfo.cpp

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • I apologize if my question was unclear. I got the part where I have to handle this as a frontend who wants to call a C function. What I wanted to know was how to handle it- i.e., if I have a given C prototype, how do I lower it into LLVM IR? I'll take a look at those files. – Puppy Aug 05 '14 at 18:29
  • @Puppy: there's no easy answer to that, alas. Lowering the ABI, especially for some platforms (x64) is nontrivial. You can take code from Clang though - it does all of that. – Eli Bendersky Aug 05 '14 at 19:33
  • I'm trying :P You were quite right about it being non-trivial though. There's a fairly huge amount of code here, and a whole load of it is really Clang-specific, so it's pretty hard to factor out the parts that are really ABI-specific. – Puppy Aug 05 '14 at 20:18
  • Another thing I've observed is that Clang often exhibits behaviour that's absolutely not covered by the ABI spec I've found, so following Clang feels quite blind in the dark. – Puppy Aug 05 '14 at 20:34
  • @Puppy: what do you mean by Clang-specific? Naturally this is lowering code, so it translates Clang data structures (AST really) to LLVM IR. It doesn't abstract around the input since there's only one input (Clang internal data structures); same for output. It does however abstract around the ABI as there are many. – Eli Bendersky Aug 05 '14 at 20:34
  • 1
    Sorry. What I really meant is, I sat and stared at the code for a number of hours and it didn't make any sense to me whatsoever. Simply trying to match what Clang does makes me feel like I don't really understand what's going on or why, and I have no idea if the resulting code is actually correct or not. – Puppy Aug 06 '14 at 11:11
  • @Puppy: you say that what Clang does appears not to match the platform ABI. Keep in mind that the IR produced by Clang is not the machine code that has to conform to that ABI. LLVM itself does further lowering on the IR, and LLVM too has ABI-specific logic, quite a bit of it actually. The things done by Clang are things LLVM IR cannot represent in terms of the ABI. I'd suggest focusing on one small feature of the ABI and trace it through Clang and then LLVM. If you think what they do doesn't match the ABI, feel free to ask about that specific issue. – Eli Bendersky Aug 06 '14 at 12:46
  • Thanks for your help. I didn't end up going exactly down this route but after looking in more detail at Clang, I managed to use their interface. Not sure if that merits this as the correct answer or not... – Puppy Aug 07 '14 at 22:36
  • 5
    So every language that targets LLVM has to reimplement all that code to have good C interop, ha? –  Jun 10 '15 at 22:10
4

I ended up hacking Clang's CodeGen internals to perform C ABI calling for me (C++ ABI support was already done). Thus instead of having to re-implement (and re-test) their code, I simply re-used their work. Officially the CodeGen APIs aren't public and aren't meant to be used by anyone, but in this case, I managed to make it work. It turns out that it's a lot less scary than it looks- many of the classes like LValue/RValue/ReturnValueSlot are just wrappers on llvm::Value* with a couple extra optional semantics tacked on.

More problematic will be creating trampolines from C ABI to my own ABI. The CodeGenFunction interface doesn't seem quite as amenable to that. But I think I can make it work.

Puppy
  • 144,682
  • 38
  • 256
  • 465