7

Which of these items can safely be assumed to be defined in any practically-usable platform ABI?

  1. Value of CHAR_BIT

  2. Size, alignment requirements and object representation of:

    1. void*, size_t, ptrdiff_t
    2. unsigned char and signed char
    3. intptr_t and uintptr_t
    4. float, double and long double
    5. short and long long
    6. int and long (but here I expect a "no")
    7. Pointer to an object type for which the platform ABI specifies these properties
    8. Pointer to function whose type only involves types for which the platform ABI specifies these properties
  3. Object representation of a null object pointer

  4. Object representation of a null function pointer

For example, if I have a library (compiled by an unknown, but ABI-conforming compiler) which publishes this function:

void* foo(void *bar, size_t baz, void* (*qux)());

can I assume to be able to safely call it in my program regardless of the compiler I use?

Or, taken the other way round, if I am writing a library, is there a set of types such that if I limit the library's public interface to this set, it will be guaranteed to be usable on all platforms where it builds?

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455

3 Answers3

3

I don't see how you can expect any library to be universally compatible. If that were possible, there would not be so many compiled variations of libraries.

For example, you could call a 64-bit library from a 16-bit program as long as you set up the call correctly. But you would have to know you're calling a 64-bit based library.

Portability is a much-talked about goal, but few truly achieve it. After 30+ years of system-level, firmware and application programming, I think of it as more of a fantasy versus a goal. Unfortunately, hardware forces us to optimize for the hardware. Therefore, when I write a library, I use the following:

  1. Compile for ABI
  2. Use a pointer to a structure for input and output for all function calls:

    int lib_func(struct *input, struct *output);
    

Where the returning int indicates errors only. I make all error codes unique. I require the user to call an init function prior to any use of the library. The user calls it as:

    lib_init(sizeof(int), sizeof(char *), sizeof(long), sizeof(long long));

So that I can decide if there will be any trouble or modify any assumptions if needed. I also add a function allowing the user to learn my data sizes and alignment in addition to version numbers.

This is not to say the user or I am expected to "on-the-fly" modify code or spend lots of CPU power reworking structures. But this allows the application to make absolutely sure it's compatible with me and vice-versa.

The other option which I have employed in the past, is to simply include several entry-point functions with my library. For example:

   int lib_func32();
   int lib_func16();
   int lib_func64();

It makes a bit of a mess for you, but you can then fix it up using the preprocessor:

   #ifdef LIB_USE32
      #define  lib_function  lib_func32
   #endif

You can do the same with data structures but I'd recommend using the same size data structure regardless of CPU size -- unless performance is a top-priority. Again, back to the hardware!

The final option I explore is whether to have entry functions of all sizes and styles which convert the input to my library's expectations, as well as my library's output.

For example, your lib_func32(&input, &output) can be compiled to expect a 32-bit aligned, 32-bit pointer but it converts the 32-bit struct into your internal 64-bit struct then calls your 64 bit function. When that returns, it reformats the 64-bit struct to its 32-bit equivalent as pointed to by the caller.

   int lib_func32(struct *input32, struct *output32)
   {
   struct input64;
   struct output64;
   int    retval;

       lib_convert32_to_64(input32, &input64);

       retval = lib_func64(&input64, &output64);

       lib_convert64_to_32(&output64, output32);

       return(retval);
   }

In summary, a totally portable solution is not viable. Even if you begin with total portability, eventually you will have to deviate. This is when things truly get messy. You break your style for deviations which then breaks your documentation and confuses users. I think it's better to just plan it from the start.

Hardware will always cause you to have deviations. Just consider how much trouble 'endianness' causes -- not to mention the number of CPU cycles which are used each day swapping byte orders.

  • 1
    Thanks for a great answer. The init function is an interesting idea, I wouldn't have thought of that. Still, you're dealing with "plain" int types only. Would you say `size_t` and `ptrdiff_t` would be more likely to have the same size in different compilers on the exact same platform? I understand 100% certainty is impossible, but could "If compiler X uses different size for them, it's a weird compiler" be considered a reasonable statement? Or are they totally unpredictable as well? – Angew is no longer proud of SO Jun 28 '13 at 17:40
  • Compiler makers follow the spec with regard to data types. I wouldn't worry about that. I have found alignment to be the single greatest mistake users make. You should include an align statement in your header files to be sure. The only way I can see different sizes on the "exact same platform" is if someone is compiling for 32-bit but using a 64-bit compiler. But in this case, your header file supplied to your user should detect a 32-bit environment and operate accordingly. –  Jun 28 '13 at 17:51
  • BTW, I can't imagine any circumstance where the size of a pointer would be different during a given compile. Using `sizeof(char *)` will always tell you the size of _all_ pointers. –  Jun 28 '13 at 17:57
  • Also, by using structs for input and output you gain many benefits: 1. you can pass as much data back and forth as you wish. 2. Users understand the lib immediately. 3. The structs you define can have self-documenting member names. 4. With ABI, the structs are in the di and si registers on entry to you. 5. new versions of your library can just add members to structs instead of changing existing function calls or adding new functions. (Include a size of struct member in all structs - acts like a struct version number.) –  Jun 29 '13 at 14:28
  • I am not worried about compiler makers not following specs. My question was (as I have next to no experience with coding for a wide variety of platforms) whether it's safe to expect that any sane ABI will define the size of `size_t` and `ptrdiff_t`. And their alignment and object representation. – Angew is no longer proud of SO Jun 30 '13 at 17:46
  • As for different pointer sizes, I believe (based partly on [this answer](http://stackoverflow.com/a/15832704/1782465)) that they are indeed possible on some hardware. Nevertheless, `char*` is guaranteed to be large enough for any pointer - because it's the same size as `void*`, and `void*` is guaranteed to be big enough for any object pointer value. – Angew is no longer proud of SO Jun 30 '13 at 17:50
2

The C standard contains an entire section in the appendix summarizing just that:

J.3 Implementation-defined behavior

A completely random subset:

  • The number of bits in a byte

  • Which of signed char and unsigned char is the same as char

  • The text encodings for multibyte and wide strings

  • Signed integer representation

  • The result of converting a pointer to an integer and vice versa (6.3.2.3). Note that this means any pointer, not just object pointers.


Update: To address your question about ABIs: An ABI (application binary interface) is not a standardized concept, and it isn't said anywhere that an implementation must even specify an ABI. The ingredients of an ABI are partly the implementation-defined behaviour of the language (though not all of it; e.g. signed-to-unsigned conversion is implementation defined, but not part of an ABI), and most of the implementation-defined aspects of the language are dictated by the hardware (e.g. signed integer representation, floating point representation, size of pointers).

However, more important aspects of an ABI are things like how function calls work, i.e. where the arguments are stored, who's responsible for cleaning up the memory, etc. It is crucial for two compilers to agree on those conventions in order for their code to be binarily compatible.

In practice, an ABI is usually the result of an implementation. Once the compiler is complete, it determines -- by virtue of its implementation -- an ABI. It may document this ABI, and other compilers, and future versions of the same compiler, may like to stick to those conventions. For C implementations on x86, this has worked rather well and there are only a few, usually well documented, free parameters that need to be communicated for code to be interoperable. But for other languages, most notably C++, you have a completely different picture: There is nothing coming near a standard ABI for C++ at all. Microsoft's compiler breaks the C++ ABI with every release. GCC tries hard to maintain ABI compatibility across versions and uses the published Itanium ABI (ironically for a now dead architecture). Other compilers may do their own, completely different thing. (And then you have of course issues with C++ standard library implementations, e.g. does your string contain one, two, or three pointers, and in which order?)

To summarize: many aspects of a compiler's ABI, especially pertaining to C, are dictated by the hardware architecture. Different C compilers for the same hardware ought to produce compatible binary code as long as certain aspects like function calling conventions are communicated properly. However, for higher-level languages all bets are off, and whether two different compilers can produce interoperable code has to be decided on a case-by-case basis.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • Thanks. I am well aware that all of these are implementation-defined. That's why I was referring to ABIs, not to the standard. My question is whether it's reasonable to expect all sane implementations on each particular platform to define (a subset of) this behaviour the same way. – Angew is no longer proud of SO Jul 02 '13 at 07:04
  • @Angew: Right, I see. I don't think that even the notion of "ABI" itself is in any way standardized, i.e. the language doesn't even say that "you shall specify an ABI". Most of the properties you list are essentially determined by the hardware, so you can expect compilers for the same hardware to agree on those, others may just be folklore (like IA32 calling conventions). – Kerrek SB Jul 02 '13 at 09:04
  • I know "ABI" is not a standard term either. Still, the way I understand it, in a hardware+OS combination, usually one of them (or both together) defines an ABI. I am really after "which of these types can I expect such an ABI to define?" – Angew is no longer proud of SO Jul 02 '13 at 09:11
  • @Angew: Well, it's a standard *term*, but it's not a concept that's standardized in any way. A compiler *defines* *an ABI* by virtue of its own implementation, and others (or future versions of itself) may or may not choose to keep that consistent. For example, GCC tries to keep even the C++ ABI consistent, while Microsoft's MSVC breaks the ABI with every release. For C it's a bit easier because any ABI is relatively small, and most of it is determined by the hardware. As far as I know, function calling conventions are the only important thing to specify on x86. – Kerrek SB Jul 02 '13 at 11:12
  • Would you be willing to re-formulate the thoughts from the comments as an addendum to your answer? They represent the colosest to my desired answer from all the information here. – Angew is no longer proud of SO Jul 02 '13 at 12:37
  • Thanks. That's exactly the info I was looking for. – Angew is no longer proud of SO Jul 04 '13 at 06:38
1

If I understand your needs correctly, uint style ones are the only ones that will give you binary compatibility guarantee and of cause int, char will but others tend to differ. i.e long on Windows and Linux, Windows considers it 4byte and Linux as 8byte. If you are really dependent on ABI, you have to plan for the platforms you are going to deliver and may be use typedefs to make things standardized and readable.